Member-only story

How does diffusion generate images? — An iterated blending view of diffusion

VisionAI Synthesis
7 min readJul 23, 2024

--

In this article we aim to obtain a simpler view of diffusion through blending.

Diffusion based generative imaging is a class of generative modeling. The famous systems, DallE-2, Imagen, Stable Diffusion are all based on this technique. An example is shown below.

DALL-E 2 generated this image when given the prompt “Teddy bears working on new AI research underwater with 1990s technology source: https://cdn.openai.com/dall-e-2/demos/text2im/teddy_bears/ai_research/underwater/4.jpg

These systems enable us to use a textual description to generate a large variety of images. Arthur Brisbane said -’a picture is worth a thousand words’ in the early twentieth century. With the use of diffusion models, we can now indeed achieve the reverse. That is, by a few words of textual description, we can now generate an image.

Scott Reed et al first showed how to generate realistic images by a textual description. They did this in their work ‘Generative Adversarial Text to Image Synthesis’, ICML 2016'.

Yet, we are now able to achieve this in a much better way using diffusion. The images generated have more variety, are of higher resolution and more photorealistic. We now consider diffusion-based models to be better than other generative techniques. Through this article we intend to better understand diffusion as iterated blending

What is diffusion?

--

--

VisionAI Synthesis
VisionAI Synthesis

Written by VisionAI Synthesis

I am interested in improving the understanding of computer vision and AI research papers and discussions based on these papers.

No responses yet