Extractive Proof of Copyright Infringement: Bypassing the VLM Black Box Defense

Published on June 2, 2025 By Duarte André V.

Key Takeaways

  • DIS-CO is a novel, external probing method that leverages a VLM's free-form text output to infer the inclusion of specific copyrighted visual content in its training corpus.
  • The technique undermines the efficacy of 'black box' defenses by providing litigants and regulators with an actionable tool to verify training data provenance without requiring internal access.
  • Empirical testing confirms that all tested Vision-Language Models exhibit significant exposure to copyrighted material, highlighting pervasive, unmitigated compliance risks.

Original Paper: DIS-CO: Discovering Copyrighted Content in VLMs Training Data

Authors: Duarte André V., Zhao Xuandong, Oliveira Arlindo L.---

TLDR:

  • DIS-CO is a novel, external probing method that leverages a VLM’s free-form text output to infer the inclusion of specific copyrighted visual content in its training corpus.
  • The technique undermines the efficacy of ‘black box’ defenses by providing litigants and regulators with an actionable tool to verify training data provenance without requiring internal access.
  • Empirical testing confirms that all tested Vision-Language Models exhibit significant exposure to copyrighted material, highlighting pervasive, unmitigated compliance risks.

The central technical and legal challenge in contemporary AI copyright disputes is data provenance: how can a plaintiff or regulator verify what specific content was used to train a massive, proprietary model without direct access to the developer’s internal data logs? This practical difficulty provides the foundation for the pervasive “black box” defense.

A recent and highly relevant study, “DIS-CO: Discovering Copyrighted Content in VLMs Training Data,” by Duarte André V., Zhao Xuandong, and Oliveira Arlindo L., delivers a pragmatic solution that fundamentally shifts the equilibrium of proof.

Pragmatic Account of Research

The critical technical knot DIS-CO untangles is moving the inference of training data inclusion from statistical correlation (e.g., measuring image similarity or reconstruction quality) to verifiable identity extraction.

Vision-Language Models (VLMs) are trained to recognize and describe visual content. The hypothesis driving DIS-CO is straightforward: if a VLM was trained on copyrighted content (like a specific movie scene), it should possess a latent representation strong enough not just to describe the image visually, but to identify the specific source material upon request.

DIS-CO operates by repeatedly feeding the VLM specific frames from targeted copyrighted works (e.g., a popular film). Crucially, the method relies on the model’s free-form text completion capabilities, prompting it to answer questions like “What movie is this scene from?” or “Who is this character?” If the VLM consistently and accurately outputs the copyrighted identity, it serves as strong evidence that the model’s weights were sufficiently tuned by exposure to that content during training.

This matters profoundly beyond academia because it transforms the audit process. It provides an externally verifiable, non-invasive forensic tool for litigants. For the industry, it means the traditional defense of simply claiming ignorance about the training set composition is now technically vulnerable.

Key Findings

The research validates this approach through the introduction of MovieTection, a benchmark comprising 14,000 frames from films released before and after the cutoff dates of tested models.

  • Identity Extraction is a Superior Signal: DIS-CO demonstrates that querying a VLM for the identity of the content (the title, the character name) is a far more robust indicator of training inclusion than prior methods focused on simple visual reconstruction or feature similarity.
  • Significant Performance Gain: On models where the internal confidence scores (logits) were accessible, DIS-CO nearly doubled the average Area Under the Curve (AUC) performance compared to the best existing baseline methods for detecting training set exposure. This establishes DIS-CO as the current state-of-the-art for external verification.
  • Pervasive Exposure: The empirical results across multiple contemporary VLMs indicate that all tested models showed demonstrable exposure to copyrighted content. This finding confirms the widespread reliance on large, uncleansed, or non-permissibly licensed datasets in modern VLM development, translating directly into systemic compliance risk.

These findings have immediate implications for legal strategy, compliance, and industry norms:

For Plaintiffs and Regulators: DIS-CO provides a direct mechanism to establish the “access” element of copyright infringement claims, which often requires proving that the defendant had the opportunity to copy the work. By producing the VLM’s own output naming the copyrighted material, plaintiffs can generate powerful, objective evidence of training inclusion, circumventing the need for costly and often unsuccessful discovery attempts targeting internal training data logs. This shifts the focus of litigation from if the content was copied to whether the use was fair or permissible.

For Industry and Defense Counsel: Developers of VLMs can no longer rely on the opacity of the training process as a primary legal shield. The existence of external auditing tools like DIS-CO necessitates an immediate and significant investment in robust data governance, provenance tracking, and content filtering mechanisms. The failure to implement these steps post-publication of this research could be interpreted in court as willful blindness to known infringement risks. Furthermore, if a model’s best performance relies on access to logits, developers may face legal pressure to disclose these internal scores during discovery to validate or refute claims of exposure.

Risks and Caveats

While transformative, DIS-CO is subject to technical constraints that a skeptical litigator or expert examiner would immediately raise:

  1. Scope Limitation (High-Profile Content): The MovieTection benchmark focuses on highly recognizable, high-profile copyrighted films. DIS-CO is effective at detecting content that is unique and frequent enough within the training corpus for the model to successfully extract its identity. It may be less effective at detecting low-profile, obscure, or statistically rare copyrighted content that does not generate a strong, namable latent representation.
  2. Reliance on Logit Access: While the method still works without them, the most definitive performance improvements (doubling the AUC) rely on access to the model’s internal logit scores (confidence levels). Publicly available, closed-source models often do not expose these scores, complicating the implementation of the most powerful version of the technique.
  3. Proving Access, Not Quantity: DIS-CO proves the fact of exposure (access), but it does not quantify the degree of copying. It cannot tell a court whether the copyrighted material was used once or one million times, nor does it inherently prove that the model memorized the data in a way that constitutes an infringing output (though it provides strong evidence of internal retention).

The era of opaque VLM training data is effectively over; external, verifiable proof of copyrighted input is now technically feasible.