Background
A group of anonymous software developers sued GitHub, Microsoft, and OpenAI alleging that the AI-powered coding tools GitHub Copilot and OpenAI Codex were trained on their licensed open-source repositories without attribution or licence compliance. The plaintiffs claim violations of the Digital Millennium Copyright Act (DMCA § 1202 (b)), breach of contract, and unfair-competition law. The court’s 2023 and 2024 dismissal orders removed several counts but allowed the core DMCA and contract claims to proceed. Subsequent filings through 2025 concern discovery management and possible consolidation of related dockets. The proceedings remain active and continue toward trial.
AI interaction
The pleadings describe how Copilot and Codex use machine-learning models 'trained on "billions of lines" of publicly available code,' placing dataset ingestion and algorithmic replication at the centre of the dispute. The court accepted those factual allegations as plausible at the pleading stage, framing the case as a test of whether AI-model training on open-source material can breach licence terms or remove copyright-management information. The litigation has become a reference point for accountability and attribution in large-scale AI training on public data.
Note:
The case remains pending with continuing discovery activity. The court’s most recent substantive ruling (ECF No. 195, January 2024) granted in part and denied in part motions to dismiss, allowing DMCA § 1202(b) and breach-of-contract claims to proceed. Additional related actions, Doe, et al. v. Github, Inc., et al. (24-6136), Doe 3 v. GitHub, Inc. (4:22-cv-07074), Doe, et al. v. Github, Inc., et al. (24-7700), have been filed by overlapping plaintiffs raising substantially similar allegations. These cases are treated as companion or successor proceedings and may be coordinated with the lead docket before trial.