Regurgitation

Regurgitation is the AI industry’s sanitized term for what is, in effect, plagiarism. It is the act of a generative model producing an output that is a direct, verbatim copy of its training data. AI companies often frame this as a rare and unintentional bug, but from a technical and legal standpoint, it is an unavoidable consequence of the model’s design.

Analogy: Dreaming

Think of how a human brain dreams. A dream is not created from nothing; it is constructed from the fragments of our memories and experiences.

  • The Dream World: Most of a dream is a remix. You might be in a house that is a combination of your childhood home and a set from a movie you saw. You might talk to a person who has the face of your friend but the voice of your boss. These are new creations, but they are built from the training data of your life.
  • The Vivid Memory: Sometimes, in the middle of a dream, you experience a moment of perfect, vivid recall. You might perfectly re-live a specific conversation or see a perfect image of a page from a book you read. This isn’t a “bug” in your dream; it’s just your brain accessing a specific, well-encoded memory.

Regurgitation is the AI’s version of that vivid, verbatim memory. A large language model is a dream machine for text. Most of what it generates is a remix of the statistical patterns from its training data. But when prompted correctly, or sometimes just by chance, it will find a strongly-encoded memory—a unique line of code, a specific legal disclaimer, a verse from a poem—and reproduce it exactly.

The existence of regurgitation fundamentally undermines the AI industry’s core legal defenses.

  1. It Disproves the “Abstract Learning” Defense: The primary argument for fair use is that AI models learn “abstract concepts” and “ideas,” not the specific expression of a work. Regurgitation is the proof that this is false. The model is clearly storing and reproducing the specific, protected expression. You cannot argue you only learned the “idea” of a poem when you can recite the poem verbatim.

  2. It is Not “Rare”: AI companies will argue that regurgitation is a one-in-a-million event. But when you are generating billions of outputs for millions of users, one-in-a-million events happen every day. More importantly, security researchers have shown that regurgitation can be induced. Specific “adversarial prompts” can be designed to deliberately extract memorized training data, including personal information and copyrighted content.

  3. The Burden of Proof Shifts: In a copyright case, the plaintiff normally has to prove that the defendant copied their work. When an AI model regurgitates copyrighted text, the act of copying is self-evident. The burden of proof effectively shifts to the AI company to defend why their machine, which they built and profit from, is outputting infringing copies of a protected work.

Regurgitation is not an anomaly. It is the ghost in the machine—the undeniable proof that the model’s “mind” is filled with the memories of the data it consumed. For a litigator, every instance of it is a gift, a clear and simple piece of evidence that cuts through the industry’s complex and self-serving narratives.