About FollowFox
followfox.ai is an AI exploratory initiative of the boutique marketing agency FollowFox.org.
Until AI takes over, FollowFox.org offers a full range of marketing services at boutique quality by top talent in the region. Support us by:
Jump to a specific section:
- Experiment Idea and Goals
- Setup Details and the Optimal Training Protocol so Far (you can replicate this)
- Judging Criteria
- Overview of results
- Detailed Results
- Future experiments
Overview and Setup
Experiment Idea and Goals
In the previous experiment, we tried Dreambooth using local GUI, 25 combinations of learning rates and steps. Check it out, link. Unfortunately, back then, we didn’t get extremely impressive results to match the original target of astria.ai (check our experience with them here link).
This time we got some impressive results, at least on par with astria. Check our details
Experiment Details and the Optimal Training Protocol so Far
This time we used Automatic1111 WebUI’s Dreambooth on our local machine. Check out the installation details in our post link.
In the Dreambooth extension, the first step is to create a model. The setup we used:
- Name: doesn’t matter. Use whatever
- Source Checkpoint: We used the official v1-5-pruned.ckpt (link)
- Scheduler: ddim
The next step is to select train model details. Our settings:
- Training Steps: 10,000. We saved checkpoints at every 1,000 steps. If you want a recommendation, just train the face for 2,000 steps for 20 photos.
- Training Epochs: Do not matter as steps override this setting
- Save Checkpoint Frequency: 1,0000
- Save Preview(s) Frequency: no need, but we had it at 500
- Learning Rate: 0.000001
- Scale Learning Rate: unchecked
- Learning Rate Scheduler: constant
- Learning Rate Warmup Steps: 0
- Resolution: 512 since we are using resized images at 512x512
- Center Crop: unchecked
- Apply Horizontal Flip: checked
- Pretrained VAE Name or Path: blank
- Use Concepts List: unchecked
- Dataset Directory and Classification Dataset Directory: whatever directories you have.
- Classification data: We used 1,500 person_ddim images from JoePenna’s repo. Link
- Existing Prompt Contents: Description
- Instance prompt: photo of sks person
- Class Prompt: photo of a person
- Sample Image Prompt/Instance Token/Class Token: all blank
- Class Images: 1500
- Classification Image Negative Prompt: blank, but doesn't matter since we are using our own regularization image set
- Classification CFG Scale: 7.5 but doesn't matter since we are using our own regularization image set
- Classification Steps: 40 but doesn't matter since we are using our own regularization image set
- Sample Images: none of these setting matter but Sample Image Negative Prompt: blank; Number of Samples to Generate:1; Sample Seed: 0; Sample CFG Scale: 7.5; Sample Steps: 40
Advanced Settings:
- Batch Size: 1
- Class Batch Size: 1
- Use CPU Only (SLOW): unchecked
- Use 8bit Adam: checked
- Mixed Precision: fp16
- Memory Attention: default
- Don't Cache Latents / Train Text Encoder / Train EMA: all 3 checked
- Shuffle After Epoch: unchecked
- Pad Tokens: Checked
- Max Token Length (Requires Pad Tokens for > 75): 75
- Gradient Checkpointing: Checked
- Gradient Accumulation Steps: 1
- Max Grad Norms: 1
- Adam Beta 1: 0.9
- Adam Beta 2: 0.999
- Adam Weight Decay: 0.01
- Adam Epsilon: 1e-8
Judging Criteria
Somewhat similar to the last time but with some new details. We added a new image to compare: superman, so we have a total of three: Pencil Sketch, Keanu, and Superman. For all 10 models + for astria, we scored each of the 8 generated images as bad: 0 points, Good: 1 point, and 0.5 for mixed. So each model can have up to 9 points max per category and up to 27 points.
Exact prompts and setting used for judging images:
Pencil Sketch:
- Prompt: sks person, pencil sketch of a teenage boy with short side part light hair smiling trending on artstation
- Settings: Steps: 40, Sampler: Euler a, CFG scale: 7.5, Seed: 1991, Size: 512x512
Keanu:
- Prompt: highly detailed portrait of sks person as keanu reeves in gta v, stephen bliss, unreal engine, fantasy art by greg rutkowski, loish, rhads, ferdinand knab, makoto shinkai and lois van baarle, artgerm, pixar, ilya kuvshinov, rossdraws, tom bagshaw, global illumination, radiant light, detailed and intricate environment
- Settings: Steps: 50, Sampler: Euler a, CFG scale: 7, Seed: 4243591472, Size: 512x512
Superman:
- Prompt: photo of sks person as superman, detailed face, sharp focus, canon 5d, 50mm, high-resolution
- Negative prompt: illustration, painting, drawing, ((deformed face)), low resolution, grain
- Settings: Steps: 80, Sampler: Euler a, CFG scale: 9.5, Seed: 3783255463, Face restoration: GFPGAN, Size: 512x512, Batch size: 8, Batch pos: 1
Summary of Results and Findings
Overview of output ratings
We think this time we managed to match astria’s results, but it might be subjective and not meaningfully better.
TLDR: 2000 steps can get you pretty good results (coincidentally, matched 100 steps per image magic formula), but going to 4,000 or even higher can be better for some specific use cases. Here is the summary of the results:
TLDR: 2000 steps can get you pretty good results (coincidentally, matched 100 steps per image magic formula), but going to 4,000 or even higher can be better for some specific use cases. Here is the summary of the results:

Bonus: Loss Graph
It’s interesting to see it, and I don’t yet have the expertise to analyze it properly. Still, I will be monitoring these for future experiments and hopefully can draw some conclusions later on.
For now, it’s interesting to observe chaos at early steps - maybe that’s why warmup could be a good idea, and interestingly at ~2k steps, there is a drop - maybe that explains those good results? I’m also tempted to try exactly 2,220 steps (lowest point there) and see if it beats 2k steps one. Please comment if you want me to write a post on how to open these up.

Detailed Results (ranked from best to worst)
1st place (split): 2,000 steps
At least a few decent images across all three prompts



2000 Steps
1st place (split): astria.ai
Close to 2,000 steps but worse and better in different places



Astria
3rd place: 4,000 steps
Not bad overall but a bit worse than the top two. 4k steps might be worth exploring a bit more



4th place: 5,000 steps
Mostly worse than others, but some good(ish) Keanu results



5th place: 6,000 steps
Pretty much the same level of performance as 4th place



6th place: 3,000 steps
It was strange that 3k was not in-between 2k and 4k and did worse.



7th place: 10,000 steps
Over-trained, but interestingly, some better results than slightly lower step count



8th place: 7,000 steps
Not much to say, 7k, 8k, and 9k are equally not great



9th place: 8,000 steps
Not much to say, 7k, 8k, and 9k are equally not great



10th place: 9,000 steps
Not much to say, 7k, 8k, and 9k are equally not great



11th place: 1,000 steps
It’s just under-trained.



Future experiments
There is a lot more to try here, so please don’t hesitate to comment if you have a specific one in mind. Moving forward, we will use the optimal setting and change one thing at a time. A few posts coming soon:
- Comparing different versions of official SD as a base
- Comparing setup schedulers
- Experimenting with non-constant learning rates
- tbd