Original Paper: Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking
Authors: Antiquus S. Hippocampus, Natalia Cerebro & Amelie P. Amygdale Department of Computer Science Cranberry-Lemon University Pittsburgh, PA 15213, USA {hippo,brain,jen}@cs.cranberry-lemon.edu &Ji Q. Ren & Yevgeny LeNet Department of Computational Neuroscience University of the Witwatersrand Joburg, South Africa {robot,net}@wits.ac.za, Coauthor Affiliation Address email---
TLDR:
- A new technical framework (TRACE) allows rights holders to forensically detect the use of their proprietary datasets in a third-party LLM’s fine-tuning process.
- Detection is achieved entirely black-box, using an entropy-gated mechanism that analyzes model output without requiring access to internal signals like logits or weights.
- This verifiable method shifts the burden of proof in IP litigation by offering concrete, statistical evidence of unauthorized dataset ingestion, crucial for modern copyright claims.
The challenge of establishing unauthorized data usage within modern Large Language Models (LLMs) is fundamentally a legal problem constrained by technical opacity. When a model developer fine-tunes a powerful base model on a smaller, high-value dataset—often proprietary or copyrighted—how can the rights holder prove infringement without gaining intrusive access to the developer’s proprietary architecture or internal metrics?
This critical technical and legal knot is addressed by Antiquus S. Hippocampus, Natalia Cerebro, and Amelie P. Amygdale, alongside Ji Q. Ren and Yevgeny LeNet, in their work, “Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking.”
The paper delivers a pragmatic account of how watermarking can transition from a theoretical defense mechanism to a verifiable, forensic tool. Existing methods, such as membership inference attacks (MIAs), are often too unreliable in black-box settings or require internal signals (e.g., logits) that commercial entities rightly guard as proprietary. TRACE bypasses this access barrier by embedding a distortion-free, private-key-guided watermark directly into the dataset before it is used for fine-tuning. The detection mechanism then exploits what the authors term the “radioactivity effect” of fine-tuning: the model’s internal biases, induced by the watermarked data, subtly manifest in its downstream token generation probabilities. By analyzing the model’s output entropy—specifically scoring high-uncertainty tokens where the watermarking signal is most likely to surface—TRACE achieves statistically significant detection, making it highly relevant to legal discovery and evidence standards.
Key Findings
- Distortion-Free, Key-Guided Watermarking: TRACE ensures that the embedded watermark does not degrade the quality of the text or impair the utility of the dataset for downstream tasks. This is essential for practical adoption, as previous watermarking techniques often introduced unacceptable performance trade-offs. The use of a private key ensures that only the rights holder or a court-appointed expert can verify the presence of their specific mark, maintaining the integrity of the forensic process.
- Entropy-Gated Black-Box Detection: The core technical breakthrough is the ability to detect the watermark without internal model access. By selectively focusing the detection scoring procedure on tokens where the model exhibits high uncertainty, the signal-to-noise ratio is dramatically amplified. This negates the common legal defense that detection requires revealing trade secrets (model internals).
- Robustness Against Dilution: The framework demonstrates resilience against common post-training operations, including continued pretraining on vast non-watermarked corpora. From a litigator’s perspective, this counters the defense argument that the original fine-tuning signal was effectively “washed out” or erased by subsequent development steps.
- Multi-Dataset Attribution: TRACE supports the ability to attribute the resulting model’s behavior to multiple, distinct watermarked datasets simultaneously. This is critical in complex commercial scenarios where models are fine-tuned on mosaics of licensed and proprietary data, allowing a rights holder to prove that their specific material, not a generic corpus, was utilized.
Legal and Practical Impact
These findings directly influence the evidentiary landscape of AI copyright litigation.
First, TRACE offers a path to establishing the often-elusive “copying” element of infringement in LLM cases. Rather than relying on speculative arguments about output similarity or requiring invasive access to source code, a rights holder can present statistically robust, forensic evidence demonstrating that the proprietary material was ingested during the model’s development lifecycle (i.e., fine-tuning). This shifts the legal focus from the model’s performance to its verifiable input history.
Second, for compliance and licensing, this technology transforms the relationship between data providers and model developers. Rights holders can move beyond simple contractual promises of non-use. They can now mandate the use of TRACE-style watermarking in licensed datasets, establishing a verifiable technical compliance mechanism. This turns the LLM supply chain from a trust exercise into one governed by auditable, cryptographic proofs.
Third, in discovery, a court could potentially mandate the use of the rights holder’s private key by a neutral technical expert to verify the black-box model’s outputs, balancing the need for evidence with the defendant’s right to protect model weights and architecture as trade secrets.
Risks and Caveats
While promising, expert examiners and skeptical litigators must acknowledge the scope and limitations of the TRACE framework.
Crucially, TRACE is designed to detect usage during the fine-tuning or post-pretraining phase. It does not address the challenge of detecting unauthorized data usage within the initial, massive pre-training corpora (e.g., Common Crawl), which often involves petabytes of mixed data. The technical demands of watermarking and verifying data at that scale remain prohibitive.
Furthermore, the integrity of the system relies fundamentally on the security and management of the private key. If the key is compromised or the watermarking process is executed improperly, the forensic evidence loses its statistical validity and legal weight.
Finally, while the paper claims robustness against general continued pretraining, the possibility of dedicated adversarial model unlearning or scrubbing techniques specifically engineered to target and erase the watermarking signal remains an unsettled question. The arms race between attribution and evasion is ongoing, and future models may incorporate countermeasures.
Watermarking is evolving from a theoretical concept to a forensic tool that can technically and legally prove the unauthorized use of proprietary data in commercial LLMs, fundamentally reshaping the dynamics of data licensing and copyright litigation.