Concept Of Diffusion Model

Updated: April 19, 2023

Diffusion model is a recent Generative model that has been updated on many areas with the 2015 paper by Sohl-Dickstein, J. and the 2019 paper by Y. Song. While the two papers generated images, it has been expanded to various data types such as 3D image¹, audio², video³, Protein⁴⁵, Molecule⁶ and even functions⁷. This article aims to explain the general structure of this popular Diffusion model.

Diffusion model의 기본 구조

Diffusion model is a model that ‘Compute Data distribution’s reverse diffusion and generate synthetic data from random initialization’ (Generative model). Let’s look at each keyword in detail.

Data distribution and Generative model

If we want to create A, we need to clarify two things. First, we need to know what A is in the space. For example, what space does an image belong to? In the digital world, an image is a vector containing the RGB information of each pixel. Therefore, an image of size $w \times h$ is an element of $\qty{0, 1, \cdots, 255}^{3wh}$. If we map the RGB information to a real number between 0 and 1 by dividing it by 255, it can also be viewed as an element of $[0, 1]^{3wh}$. Audio is composed of sound signals $a_t$ at time step $t = 1, \cdots, T$, so it is an element of $\mathbb{R}_+^{T}$. A molecule is composed of atoms, and can be viewed as a network with nodes being atoms and edges being bonds, so a molecule with $n$ atoms is an element of $\qty{ 1, \cdots, N }^n \qty{ 0, 1, 2, 3 }^{n(n - 1)/2}$.

Then, can all vectors in the space $[0, 1]^{3wh}$ be considered images? Are all elements of $\mathbb{R}_+^{T}$ sound? Are all elements of $\qty{ 1, \cdots, N }^n \qty{ 0, 1, 2, 3 }^{n(n - 1)/2}$ molecules? Of course not. For example, the left figure below is not considered an image.

Random noise	Real Image

This is related to the question ‘What is A in the space?’ If there is a condition that A must satisfy and all data that satisfies the condition can be called A, we can generate A by creating arbitrary data that satisfies the condition.

However, if it is difficult to clarify the condition, what should we do? What condition must be satisfied for ‘picture’ or ‘photo’? If it is difficult to explain, what methods can we use to find an appropriate condition?

Machine learning induces the machine to find the condition from the training data, and we say that the machine learns the data distribution. In other words, it learns where the data that is judged as A is located in the entire space, and if the data is located in a place where the probability of being A is high, it is judged as A.

Here, there is one important assumption: If the data is similar to the training data of A, it is A. For example, if we change a few pixels or add very small noise to the data that we recognize as an image, we still recognize the result as an image. Based on this, if we train the machine to create data around the area where the training data is located, that is, data close to the training data, we can create quite realistic fake data. (If the data is easily changed even with a little change, the area of the data cluster becomes smaller and the boundary becomes more numerous, so the training data for the model that learns the boundary between the data clusters becomes more necessary.)

However, this method is greatly affected by the distribution of the training data, so it is important to construct training data without bias so that it can sufficiently simulate the desired data distribution.

From this point of view, Generative model is a random number generator in a large sense. It is a model that creates any number that follows the learned data distribution.

Reverse diffusion

The core of how diffusion model creates random data is reverse diffusion. What does it mean to reverse diffusion? First, let’s think about what diffusion is.

Physically, diffusion means the flow of particles from a high density area to a low density area. Not all particles move depending on density, but each particle moves randomly. There are particles moving from a high density area to a low density area and vice versa. However, since the number of particles moving from a high density area to a low density area is much greater, diffusion appears as a macroscopic form.

To correspond this to the data, let’s imagine the ‘space of data’ described earlier and assume that there are countless data in it, each represented by a point. Then, let’s give the data a very small noise and let it move slightly in the data space. If we repeat this, the data will gradually spread and cover the entire space evenly.

Original Data

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. “Dreamfusion: Text-to-3d using 2d diffusion”. arXiv (2022) ↩
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. “Diffwave: A versatile diffusion model for audio synthesis”. arXiv (2020) ↩
Vikram Voleti, Alexia Jolicoeur-Martineau, and Christopher Pal. “Mcvd: Masked conditional video diffusion for prediction, generation, and interpolation”. NeurIPS (2022) ↩
Kevin E Wu, Kevin K Yang, Rianne van den Berg, James Y Zou, Alex X Lu, and Ava P Amini. “Protein structure generation via folding diffusion”. arXiv (2022) ↩
Joseph L. Watson et al. “Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models”. bioRXiv (2022) ↩
Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. “Geodiff: A geometric diffusion model for molecular conformation generation”. arXiv (2022) ↩
Jae Hyun Lim et al. “Score-based diffusion models in function space”. arXiv (2023) ↩
Sohl-Dickstein, J. et al. “Deep Unsupervised Learning using Nonequilibrium Thermodynamics” PMLR (2015) ↩ ↩²
Yang Song, and Stefano Ermon, “Generative Modeling by Estimating Gradients of the Data Distribution”, NeurIPS (2019) ↩

Myeongseon Choi

Concept Of Diffusion Model

Diffusion model의 기본 구조

Data distribution and Generative model

Reverse diffusion

Denoising mechanism

Reverse diffusion kernel

Score-based model

Limitation

References

Share on

You may also enjoy

Matrix Transpose in CUDA

Kummer’s Theorem

Numb3rs S1E3: Spatial SIR Model

Numb3rs S1E1: The Rossmo Formula