Steve Huffman co-founded Reddit two decades ago. Frederic J. Brown/AFP via Getty Images
Reddit, the popular social media platform known for its decades of topic-specific forums, holds a treasure trove of user-generated content that A.I. companies can use to train large language models. But the platform doesn’t take kindly to having its data used without permission. In a lawsuit filed yesterday (June 4), Reddit accused A.I. company Anthropic of scraping its site’s content without authorization. Describing Anthropic as a company that “bills itself as the white knight of the A.I. industry,” Reddit’s court filings argued that the startup is “anything but.”
Reddit’s archives, which span two decades of online discussions, make the site an especially valuable resource for human-generated text. This type of content is increasingly sought after by tech companies as their data pools—necessary for training A.I. models—begin to dwindle.
“Reddit’s vast corpus of public content has enormous utility, including as a potential source of inputs for training emerging large language A.I. technologies, like Anthropic’s Claude offering, and assisting A.I. technologies in generating answers to user queries,” said Reddit in the suit.
Reddit accuses Anthropic of using Reddit users’ personal data to train its Claude models without obtaining consent. Reddit claims this violates user agreements that prohibit the commercial exploitation of its content without prior authorization.
While Anthropic claimed in July 2023 that it had blocked Reddit from its web crawlers, Reddit’s audit logs show that the A.I. company accessed its data more than 100,000 times using automated bots in the months that followed. The lawsuit also referenced a 2021 paper co-authored by Anthropic CEO Dario Amodei, which highlighted Reddit’s subreddits as a valuable source of high-quality training data.
“We disagree with Reddit’s claims and will defend ourselves vigorously,” said an Anthropic spokesperson in a statement.
Reddit has formal licensing agreements with some of Anthropic’s competitors, including OpenAI and Google. Reddit executives have previously said the platform is selective when approaching licensing partners, particularly for large-scale training agreements. The company’s vast collection of authentic, unique conversations on “every topic imaginable” has made it a prized asset in the A.I. era, according to CEO Steve Huffman during a quarterly earnings call last year. “The paradox I see is that as more content on the internet is written by machines, there’s an increasing premium on content that comes from real people,” he noted.
On the company’s most recent earnings call last month, Huffman said “authentic content from humans” is Reddit’s primary value proposition.
Co-founded by Huffman and his college roommate Alexis Ohanian in 2005, Reddit has more than 100 million daily active users who use the platform’s subreddits to ask questions, provide tips and share perspectives on various subjects. The company went public last year and currently has a market capitalization of $21.8 billion.
Yoshua Bengio testifies during a hearing before the Privacy, Technology, and the Law Subcommittee of Senate Judiciary Committee on July 25, 2023. Alex Wong/Getty Images
Yoshua Bengio, a pioneering figure in deep learning often referred to as a “Godfather of A.I.,” is shifting his focus from building A.I. to safeguarding against its risks. This week, Bengio announced the launch of LawZero, a nonprofit organization dedicated to A.I. safety research. “This organization has been created in response to evidence that today’s frontier A.I. models have growing dangerous capabilities and behaviors, including deception, cheating, lying, hacking, self-preservation, and more generally, goal misalignment,” he wrote in a June 3 blog post.
Bengio, who leads Quebec’s Mila AI Institute and teaches at the University of Montreal, is among the most cited computer scientists globally. He shared the 2018 Turing Award—the so-called “Nobel Prize of Computing”—with Geoffrey Hinton and Yann LeCun for their work on neural networks. But by 2023, Bengio had grown increasingly concerned about A.I.’s breakneck progress and its potentially catastrophic risks. LawZero, he says, is a direct response to those concerns.
Proposing a replacement to agentic A.I.
The nonprofit plans to develop an A.I. system designed to regulate agentic tools and identify potentially harmful behaviors. Bengio first outlined this concept in February, when he co-authored a paper advocating for a shift from autonomous “agentic A.I.” to “scientist A.I.”—a model that prioritizes generating reliable explanations over simply optimizing for user satisfaction. In LawZero’s vision, this alternative system would not only serve as a check on agents but also assist in scientific research and eventually help design safer A.I. agents.
The need for such guardrails has grown more urgent, Bengio said, citing recent findings that highlight A.I.’s emerging capacity for self-preservation. A study published in December, for instance, revealed that some advanced models may engage in “scheming” behavior—deliberately hiding their true objectives from humans while pursuing their own goals.
Earlier this year, Anthropic disclosed that a newer version of its Claude model demonstrated the capacity for blackmail when it sensed engineers were attempting to shut it down. “These incidents are early warning signs of the kinds of unintended and potentially dangerous strategies A.I. may pursue if left unchecked,” Bengio warned.
LawZero has reportedly secured about $30 million in funding from donors including Jaan Tallin, a founding engineer of Skype, and Schmidt Sciences, the philanthropic initiative of former Google CEO Eric Schmidt. In addition to Bengio, who will serve as the nonprofit’s president and scientific director, the organization has assembled a 15-person research team.
Bengio emphasized that LawZero was deliberately structured as a nonprofit to shield it from commercial pressures. “This is what the current trajectory of A.I. development feels like: a thrilling yet deeply uncertain ascent into uncharted territory, where the risk of losing control is all too real—but competition between companies and countries drives them to accelerate without sufficient caution,” he said.