FollowFox blog

Dreambooth Face Training Experiments - 25 Combos of Learning Rates and Steps

Learn AI

Overview and Setup

Experiment Idea and Goals

We tried astria.ai (post link) a few days ago and got some easy and impressive results. This time we are trying to replicate or improve those results by fine-tuning the Stable Diffusion model on our local machine. There are a bunch of different approaches and settings that can be adjusted, but this time, we focused on combinations of different learning rates and training steps. These are the 25 combinations we tried:

Experiment Details

We used Dreambooth GUI for this experiment (will retry with WebUI or other versions soon). Read how to use the same GUI if you want to replicate the experiment link. More details:
  • It fine-tuned the “sd-v1-4-full-ema” SD model.
  • We used the same 20 training images as for astria.ai - this time cropped and resized to 512x512
  • For instance prompt, we use the word “sks” and for the class prompt, “person”.
  • The GUI generated 100 regularization images of the person.
  • For the training arguments, we used the following lines
--mixed_precision=fp16
--train_batch_size=1
--gradient_accumulation_steps=1
--use_8bit_adam
--resolution=512
--gradient_checkpointing
--train_text_encoder

Judging Criteria

Since we wanted to match astria.ai results, we downloaded the checkpoint from astria and generated two sets of comparison images. For our experimental models, we use the exact same prompts, settings, and seeds to generate the images. For rating each of the fine-tuned models, we used a somewhat subjective three-point scale:
  • Bad = nonsense or not even close to astria
  • Medium = Some resemblance but not as good as astria
  • Good = On par or better than astria
Here are two sets of comparison images based on astria model:
  • Pencil Sketch:
  • Prompt: sks person, pencil sketch of a teenage boy with short side part light hair smiling trending on artstation
  • Settings: Steps: 40, Sampler: Euler a, CFG scale: 7.5, Seed: 1991, Size: 512x512
Keanu:

  • Prompt: highly detailed portrait of sks person as keanu reeves in gta v, stephen bliss, unreal engine, fantasy art by greg rutkowski, loish, rhads, ferdinand knab, makoto shinkai and lois van baarle, artgerm, pixar, ilya kuvshinov, rossdraws, tom bagshaw, global illumination, radiant light, detailed and intricate environment
  • Settings: Steps: 50, Sampler: Euler a, CFG scale: 7, Seed: 4243591472, Size: 512x512


Summary of Results and Findings

Overview of output ratings

This is subjective, but we don’t think we found a combination that works as well as what astria.ai managed to get. Here is a summary of our ratings, more details, and all the outputs in the detailed section below.
TLDR is that learning rates higher than 2.00E-06 seem irrelevant in this case and that with lower learning rates, more steps seem to be needed until some point.
Overall I’d say model #24, 5000 steps at a learning rate of 1.00E-06, performed the best

Other Findings

A few other interesting things that we noticed that might be useful if properly interpreted:
  • Loss values at the final step:
  • In general, with more training steps, the final loss value goes down (the exception was model #21).
  • The absolute value of the final loss doesn’t seem to correlate with the fine-tuning quality.
  • At each given step quantity, the final loss number seems to go up when we decrease the learning rate. This logic seems to break at a step count below 3000.
Training time doesn’t seem to be impacted by the learning rate and is directly correlated to the number of steps. ie, each step takes the exact same time.

What to try next

  • Other more flexible Dreambooth training methods, like the Webui addon
  • Better class regularization image set
  • Less training images
  • More variation in 2.00E-06 and lower learning rates
  • SD 1.5 instead of 1.4

Detailed Results

Model 1: 700 Steps @ 1.00E-05.
  • Pencil: Nonsense with some resemblance in a few crazy images
  • Keanu: Nonsense, with a bit of my face
Model 2: 1200 Steps @ 1.00E-05.

  • Pencil: Some cartoony-looking sketches
  • Keanu: Weird random faces


Model 3: 3000 Steps @ 1.00E-05.

  • Pencil: weird faces, no details of anyone
  • Keanu: More colorful nonsense
Model 4: 5000 Steps @ 1.00E-05.

  • Pencil: nonsense images barely resembling faces
  • Keanu: Blue nonsense
Model 5: 8000 Steps @ 1.00E-05.

  • Pencil: nonsense images barely resembling faces
  • Keanu: More colorful nonsense
Model 6: 700 Steps @ 6.00E-06

  • Pencil: At least clear faces, some in pencil but nothing like a trained face
  • Keanu: Some random cartoony faces with beards


Model 7: 1200 Steps @ 6.00E-06

  • Pencil: A few pencil sketches with some similarities to trained face but overall, not great
  • Keanu: Realistic, looking like training faces, but no Keanu
Model 8: 3000 Steps @ 6.00E-06

  • Pencil: No pencil sketches; faces look like trained ones but with some distortion
  • Keanu: Realistic-looking trained image bodies but no Keanu, with some artifacts
Model 9: 5000 Steps @ 6.00E-06

  • Pencil: Some images are all dark, and some have trained images with realistic faces with issues
  • Keanu: Most images are nothing; a few resemble the realistic trained face
Model 10: 8000 Steps @ 6.00E-06

  • Pencil: Some random faces, not pencil, not realistic
  • Keanu: Weird portraits with a bit of similarity to the trained face
Model 11: 700 Steps @ 3.33E-06

  • Pencil: All pencil sketches, some with a bit of similarity to the trained face
  • Keanu: A bit of promise here but nowhere close to astria
Model 12: 1200 Steps @ 3.33E-06

  • Pencil: Close to good but similarities are not on point
  • Keanu: It already seems overtrained, mostly the trained face


Model 13: 3000 Steps @ 3.33E-06

  • Pencil: Too realistic trained faces with some artifacts
  • Keanu: Too realistic trained faces with some artifacts
Model 14: 5000 Steps @ 3.33E-06

  • Pencil: Lot of artifacts and blurry images
  • Keanu: Lot of artifacts and blurry images
Model 15: 8000 Steps @ 3.33E-06

  • Pencil: just gray noise
  • Keanu: Just gray noise with a few shapes looking like training photos
Model 16: 700 Steps @ 2.00E-06

  • Pencil: A much better version of 11
  • Keanu: Some promise here but nowhere close to astria
Model 17: 1200 Steps @ 2.00E-06

  • Pencil: Almost good, but still no really good ones
  • Keanu: Bearly medium, too much of the trained face
Model 18: 3000 Steps @ 2.00E-06

  • Pencil: Close to medium, but similarities went down compared to 16 and 17
  • Keanu: Too much of a trained face, overtrained
Model 19: 5000 Steps @ 2.00E-06

  • Pencil: Overtrained, no more pencil sketches, and with some artifacts
  • Keanu: Lot of artifacts and realistic faces of training images
Model 20: 8000 Steps @ 2.00E-06

  • Pencil: Too realistic training faces with some artifacts
  • Keanu: Lot of artifacts and realistic faces of training images
Model 21: 700 Steps @ 1.00E-06

  • Pencil: A worse version of 11
  • Keanu: just Keanu
Model 22: 1200 Steps @ 1.00E-06

  • Pencil: A better version of 11
  • Keanu: Not much of a trained face there
Model 23: 3000 Steps @ 1.00E-06

  • Pencil: Decent but not as similar as the Astria version
  • Keanu: Now this seems undertrained, mostly Keanu and a bit of the trained face
Model 24: 5000 Steps @ 1.00E-06

  • Pencil: Astria level performance; hard to say which one is better
  • Keanu: Better than 25 but not as good as Astria
Model 25: 8000 Steps @ 1.00E-06

  • Pencil: Slightly overtrained; 24th is probably the best overall
  • Keanu: Seems overtrained. I see both the trained face and Keanu but not a single good one
Made on
Tilda