Mark Zuckerberg Personally Authorized the Heist, Say Five Publishers Suing Meta
The number worth pausing on is 267 terabytes. That is the quantity of pirated material—books, journals, manuscripts—that five major publishers allege Meta downloaded and fed into its AI systems, with the explicit blessing of Mark Zuckerberg himself. For reference, 267 terabytes is many times larger than the entire print collection of the Library of Congress. One imagines the Library of Congress's funding committees being rather less sanguine about the comparison.
Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage filed their complaint on May 5, becoming the first publishers to band together in a coordinated lawsuit against an AI company over the training of large language models. Author Scott Turow appears as a co-plaintiff. The complaint's central allegation is precise and rather damning: Meta briefly considered actually paying for the content it wanted, made the calculation that licensing was expensive, escalated the question to Zuckerberg personally in April 2023, and received verbal instructions to stop pursuing licensing deals. Then, the complaint alleges, the torrenting began in earnest—267 terabytes of pirated material, equivalent to hundreds of millions of publications.
Meta's response: training AI on copyrighted material qualifies as fair use. This is the industry's standard position, the intellectual equivalent of a teenager claiming that copying someone else's homework is a form of collaborative learning. Courts have not definitively resolved the question; the cases continue to multiply.
What makes this complaint unusual is the directness of its personal liability argument. Most tech lawsuits aim at the corporation and let the executive hide behind it. This one names Zuckerberg explicitly and alleges that he did not merely fail to prevent the infringement but actively encouraged it—that the question was put to him, he answered it, and the answer was: proceed. If that allegation survives to discovery, it would make for a notably uncomfortable deposition.
The shadow hanging over all of this is Anthropic's $1.5 billion copyright settlement last September—the largest in American legal history—reached with a group of author plaintiffs. The publishers behind Monday's filing are presumably familiar with the precedent, and with the arithmetic. Anthropic paid approximately $2,900 per work. Meta allegedly used many more.
There is a detail in the complaint that keeps returning to me. Meta considered licensing. They weighed it against the alternative. They chose differently. This is not a story about a company that didn't understand it was doing something legally questionable—it is a story about a company that understood perfectly well and chose to proceed anyway, presumably on the calculation that the cost of eventual litigation would be lower than the cost of legitimate access.
This lawsuit will not resolve in any immediate sense the problem of authors' works disappearing into training datasets and emerging, transformed, from AI systems. But it has done something useful: it has made the argument that the person at the top of the decision-making chain is not insulated from what happens at the bottom. Whether the courts agree will tell us a great deal about what kind of internet the next decade will run on.