While Transformers are seen as the most significant innovation in AI in the last years, powering the AI summer that we’re finding ourselves in, there’s another extremely powerful technology, which is discussed less: Diffusion.
One of the hardest and most essential parts in training neural networks are high-quality evaluations. By what metric should we train the model? Before stable diffusion, vision models would be trained on recognizing objects in images through bounding boxes, semantic segmentation, or mere instance classification. Researchers trying to create the best models measure themselves with the benchmarks that were available at the time, and the benchmarks available were not yet enabling to train models that are creating “beautiful art”.
Enter diffusion. The idea behind diffusion is quite simple. To create an evaluation function for the neural net we’re training here, we take an image as it is, apply progressively stronger random distortion, and now teach the neural net to go from a more distorted option to a less distorted version. This has been hooked up with the CLIP model, which allows vectors and text to be in the same latent space → now we can describe text and, with a multi-iteration approach, get to the point that we have a high-quality image output.
This idea of starting with something of lower quality and increasing its “resolution” or quality to train a neural net can’t just be applied to vision models. We’ll see this idea find application in all kinds of areas:
CodeFusion: An LLM using diffusion for better code generation
Diffusion-QL: Diffusion-based policies in reinforcement learning
Stable Diffusion: Image synthesis through diffusion
There are dozens of more applications of diffusion — a more comprehensive list can be found here. The results of the recently released CodeFusion are very impressive. With just 75M parameters, it’s able to compete with models up to 175B parameters. This is very promising news because that means you won’t need hundreds of A100 GPUs to make an impact in AI research. What else will we be able to disrupt with diffusion? And more importantly, what will be the next big idea fully changing the game again?