Diffusion Model

Diffusion is the dominant architecture used to create the AI-generated images that have flooded the internet and the courts. While the process is technically complex, the legal implications are straightforward. It is a system that, by its very design, requires the mass copying of existing images to function. The debate is not if copying occurs, but whether that copying is legally defensible.

Analogy: The Sculptor’s Clay

Imagine a sculptor who wants to create a new statue of a “brave warrior.”

  1. The Library: First, the sculptor goes to a museum containing thousands of statues of warriors. He takes a high-resolution 3D scan of every single one—their poses, their armor, their facial expressions. He does this without permission. This is the training data.
  2. The “Liquid Clay”: He doesn’t keep the individual scans. Instead, he feeds them into a machine that creates a kind of “liquid clay”—a statistical amalgam of all the statues. This clay is not a copy of any single statue, but it contains the essence of all of them. The patterns of “warrior-ness” are encoded within it. This is the trained diffusion model.
  3. The Creation: To make his new statue, the sculptor starts with a random, unformed block of this special clay (random noise). He then provides a description: “A brave warrior with a horned helmet.” Guided by the patterns embedded in the clay, he slowly refines the block, pushing and pulling, until it takes the form of a new statue that matches the description. This is the denoising process.

The final statue is new, but it is made entirely from the essence of the statues that were scanned without permission. And sometimes, a recognizable piece of an original statue—a specific helmet design, a unique shield emblem—reappears in the final work.

This process creates several critical points of legal contention.

  1. The “Copying” is in the Training: The primary act of infringement, plaintiffs argue, happens during the training phase. The model cannot learn the “pattern” of a warrior without first making a copy of the warrior statues. The AI companies’ defense is that these copies are intermediate and that the final “liquid clay” model is a transformative new work. But the initial copying is an undeniable fact of the process.

  2. Style is Not an Abstract Concept: AI companies claim their models learn “styles,” not specific images. But a style is the aggregate of a specific artist’s work. To learn the “style of Van Gogh,” the model must first be trained on copies of Van Gogh’s actual paintings. You cannot separate the style from the work. Plaintiffs argue that this “style extraction” is itself a form of infringement, as it allows a company to profit from an artist’s entire body of work without compensation.

  3. Memorization and Watermarks: The “liquid clay” sometimes contains “lumps”—un-melted-down pieces of the original statues. This is memorization. A diffusion model can and does memorize some of its training images. When a generated image contains a distorted but recognizable artist’s watermark or signature, it is irrefutable proof that the model was trained on that specific artist’s work. It is the model’s own “fingerprint” left at the scene of the crime.

Diffusion models are not magic. They are complex statistical engines that operate on a foundation of mass-scale data copying. Understanding the “sculptor’s clay” analogy allows a litigator to move past the abstract claims of “learning” and focus on the concrete technical and legal realities of the process.