Fine-Tuning AI Turns Derivative Works into Preferred Market Substitutes, Quantifying Copyright Harm

Original Paper: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

Authors: Tuhin Chakrabarty; Jane C. Ginsburg; Paramveer Dhillon
Stony Brook University; Columbia Law School; University of Michigan; MIT Initiative on the Digital Economy
Corresponding: [email protected]; [email protected]; [email protected]

TLDR:

Expert and lay readers strongly prefer text generated by AI models fine-tuned on complete copyrighted works over text written by expert human writers.
This preference reversal is driven by fine-tuning eliminating detectable “AI stylistic quirks” (e.g., cliché density), rendering the outputs nearly undetectable by current tools.
The resulting high-quality, preferred outputs provide concrete empirical evidence of market substitution, significantly strengthening the fourth factor analysis in copyright litigation.

The ongoing legal dispute regarding the unauthorized use of copyrighted literary material to train large language models (LLMs) hinges crucially on the economic question of market harm. Can AI truly emulate human literary style well enough to function as a replacement for human authors? A recent study, “Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers,” co-authored by Tuhin Chakrabarty, Jane C. Ginsburg, and Paramveer Dhillon, moves the debate from theoretical possibility to demonstrated economic reality.

Pragmatic Account of the Research

The core technical-legal knot this research untangles is the empirical assessment of market substitution under the fourth factor of the Fair Use doctrine: the effect of the use upon the potential market for or value of the copyrighted work. Before this study, claims of market harm resulting from AI generation were largely speculative or based on anecdotal evidence of plagiarism.

This research establishes a functional, empirically-verified link between the method of AI training and the resultant competitive threat. By comparing standard in-context prompting (zero-shot/few-shot methods) with fine-tuning (training on an individual author’s complete works), the authors isolated the specific technical step that transforms a merely derivative output into a preferred market substitute.

For thoughtful professionals operating at the intersection of law and technology, this matters immensely. If an AI output is empirically preferred by readers—including literary experts—and is stylistically indistinguishable from human work, it ceases to be merely a transformative input and becomes a direct, superior competitor in the marketplace. This data provides plaintiffs with concrete evidence to quantify potential damages and forces defendants relying on the Fair Use defense to confront a quantified market displacement effect.

Key Findings

The study’s results hinge on the stark contrast between standard LLM usage and targeted fine-tuning:

In-Context Prompting Yields Inferior Results: When experts (MFA candidates) evaluated text generated via standard in-context prompting (e.g., asking ChatGPT to “write in the style of Author X”), they strongly disfavored the AI output for both stylistic fidelity and writing quality (Odds Ratio [OR] for quality = 0.13). This confirms the intuitive understanding that basic prompting often results in stylistically clumsy output.
Fine-Tuning Reverses Expert Preference: Fine-tuning the LLM on the specific author’s complete works dramatically reversed these preferences. Experts now overwhelmingly favored the fine-tuned AI-generated text for stylistic fidelity (OR=8.16) and quality (OR=1.87). This indicates that the technical effort of fine-tuning successfully transfers the unique, marketable stylistic elements of the source author into the generative model.
Elimination of AI Quirks Drives Superiority: Mediation analysis revealed that this preference reversal occurred because fine-tuning eliminated detectable “AI stylistic quirks” (such as excessive cliché density) that penalize standard outputs. Crucially, the fine-tuned text was rarely flagged as AI-generated by state-of-the-art detection tools (3% detection rate), making the high-quality output practically indistinguishable from human work.

Legal and Practical Impact

These findings do not merely settle an academic debate; they provide powerful, actionable evidence for litigation and compliance strategy.

Fair Use and Market Harm Quantification: The demonstration that fine-tuned AI output is not just acceptable, but preferred by human readers over expert human writing, directly strengthens the argument that the AI output acts as a market substitute. This shifts the analysis of Fair Use Factor 4 from mere speculation about potential harm to evidence of demonstrated competitive superiority. Litigants representing rights-holders can now argue that the unauthorized use of copyrighted works enables the creation of a product that is objectively better—and therefore more damaging to the original market—than human-authored alternatives.

Compliance and Risk Management: For technology companies and industry consortia developing generative models, this research highlights the elevated legal risk associated with model fine-tuning. While general LLM training datasets might be defensible under some transformative use theories, the targeted, author-specific fine-tuning process creates outputs that are demonstrably highly derivative and commercially competitive. Compliance strategies must now account for the differential risk between general model training and targeted fine-tuning, likely necessitating more robust, specific licensing agreements for data used in model specialization.

Damages Assessment: If the resulting AI text is preferred by the market, the economic value—and therefore the potential damages—associated with the unauthorized use of the training material increases. The analysis of statutory damages or actual market harm must now incorporate the finding that the infringing product is not just competing, but potentially outperforming the original human market.

Risks and Caveats

While the evidence is compelling, a skeptical litigator or expert examiner would raise several critical technical limitations:

Scope Boundary: The study’s results are based on blind comparisons of short excerpts (up to 450 words). Generating a cohesive, novel-length work requires sustained narrative skill, character consistency, and structural integrity that the study does not test. The cost and effort required to transform raw fine-tuned output into a publishable full-length manuscript—the “human effort required to transform raw AI output,” as the authors note—are not factored into the economic analysis.
Evolving Detectability: The 3% detection rate is a snapshot against current state-of-the-art detectors. The arms race between generative models and detection tools is constant. If future detection methods become more reliable, the commercial viability and legal risk profile of these outputs could change quickly.
Data Dependency: The success of fine-tuning is entirely dependent on the completeness and quality of the copyrighted input. This implies that the highest legal risk is concentrated on uses involving the entire corpus of a prolific author, rather than fragments.

Take-Away

When a model is fine-tuned on specific copyrighted works, the resulting high-fidelity output is not just derivative, it is empirically a superior market substitute, fundamentally changing the risk profile of unauthorized training data use under Fair Use analysis.

Comprehensive Research

Training Data Forensics

Evidence Database

Solutions

About