We think all base models did a fairly good job. If you are looking for a single recommendation: use SD 1.5-pruned at 3,000 steps. If you want to experiment a bit with steps or other settings, SD 1.5-pruned-emaonly seems like the best option. And in general, 1.5 weights did better than the 1.4 ones.
Overview and Setup
One of the first decisions you must make before doing a Dreambooth fine-tune is which model to use as a base. We decided to test a few official ones and compared the results to each other. The main goal was to find the model with the best output quality, but some performance optimization possibilities were also monitored (file size, training time, etc.)
Setup Details
Overall, the setup was identical to the previous experiment. See the details link. The main difference is that we tried four different source checkpoints at three different learning steps each (2k, 3k, 4k). Checkpoints tested:
As we are trying to standardize judging criteria, we used the same three concepts for each model. See details here the link:
A realistic, high-quality photo
Stylized avatar
Fine-tuned subject depicted as Superman
Summary of Results
Overall, we got a lot of good fine-tuned checkpoints. The realistic photo was an easy task for all the models. With avatars, all did fine except for SD 1.4 Full ema. And finally, for Superman… we are not getting consistently good results, but if any, SD 1.5-pruned-emaonly did the best.
Loss Graphs and Training Time
Interestingly, both 1.5 checkpoints took a bit less time (46 mins to 4k steps) than 1.4 ones (51mins to 4k steps)
As for loss graphs, surprisingly, we got 4 identical graphs about loss and steps values.