Quantitative Overview

Litigation Impact Metrics 2023–2025

Metrics reflect 14 plaintiff-side AI training misuse matters handled between Q1 2023 and Q3 2025. Data spans federal district courts in the Northern District of California, Southern District of New York, Western District of Washington, and the English High Court.¹

Methodology Overview

Each metric aggregates matter-level deltas comparing client baseline processes against S-Square-led workflows. We captured discovery timecards, evidence yields, and motion outcomes, then normalized for case complexity using a four-factor index (number of custodians, dataset size, opposing party posture, cross-border discovery).⁶

Baseline benchmarks derive from counsel-provided time studies or interrogatory responses.
Confidence intervals computed via bootstrap resampling (10,000 draws) with bias-corrected acceleration.
Sparse data points (< 8 matters) flagged and accompanied by qualitative context.

Early Technical Integration Compression

Matters in which technical experts were engaged before the initial Rule 26(f) conference saw a median 21-day reduction in discovery scheduling disputes and a 38% reduction in document review hours compared to same-jurisdiction controls.²

Discovery Effort Benchmarks

Across 9 matters with comparable ESI volume, average document review hours dropped from 642 (baseline) to 398 after implementing dataset attribution memos and focused subpoenas. The 95% confidence interval for the delta is [-302, -238] hours.

Key driver: narrowing custodians via graph analysis of dataset ingestion pipelines, eliminating 31% of originally scoped custodians.³

Verbatim Evidence Yield

Controlled prompting campaigns averaged 11.4x the verbatim and near-verbatim outputs relative to counsel-run prompts. Measurement captured unique matches above a 0.88 cosine similarity threshold (sentence-BERT) across 1,200 scripted prompts per matter.⁴

Peak gains were observed when prompts incorporated targeted n-gram reconstruction seeded from leaked dataset manifests rather than generic narrative requests.

Model Disclosure Motion Outcomes

Of 12 motions to compel model documentation or training data disclosures supported by our statistical attribution reports, 11 were granted in whole or substantial part. The lone denial stemmed from a protective order limiting access to third-party data under GDPR constraints.⁵⁷

Orders granting relief commonly cited the combination of membership inference testing and provenance tracing as evidence of likely use of plaintiff data.

Comprehensive Research

Training Data Forensics

Evidence Database