Paper Review : Improved Techniques For Training Score-Based Generative Models
Paper information
Yang Song, and Stefano Ermon. “Improved Techniques for Training Score-Based Generative Models”, NeurIPS (2020)
Abstract
Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32x32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can effortlessly scale score-based generative models to images with unprecedented resolutions ranging from 64x64 to 256x256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and multiple LSUN categories.
Limitation of previous work
The 2019 research was only useful for low-resolution images (32x32) for the following reasons:
- While the method learns scores at multiple noise levels to simultaneously capture coarse and fine-grained image information and learn how to denoise Gaussian noise, there was no proper method for selecting noise levels. The noise levels proposed in the previous paper showed decent performance at 32x32 resolution but did not work well at higher resolutions.
- Langevin dynamics took a long time to converge to the data distribution when based on high-dimensional, inaccurate scores.
Tackles
- Proposed a theoretical methodology for selecting appropriate Gaussian noise levels for the data distribution
- Improved Langevin dynamics for faster convergence
- Exponential moving average
Choosing noise level
Key questions:
- Was the initial noise level large enough?
- Is using a geometric progression the best way to vary noise levels?
- Was dividing into 10 noise levels a good choice?