Reddit Sues Perplexity AI for Alleged Data Theft
A New Battle in the AI Content Wars
The ongoing tension between human creativity and artificial intelligence has reached a new flashpoint. Reddit has filed a copyright infringement lawsuit against Perplexity AI, accusing the San Francisco-based startup of illegally scraping its massive archive of user posts for AI training purposes.
Filed in a New York federal court, the lawsuit marks the latest clash between content platforms and AI developers accused of using copyrighted material without approval or compensation.
Reddit’s Chief Legal Officer, Ben Lee, described the issue as part of a larger content crisis, warning that the AI industry is being fuelled by “industrial-scale data laundering.”

Reddit’s Claims: Unauthorized Data Harvesting at Scale
Reddit alleges that Perplexity AI used external data scraping services to extract millions of user posts without consent. According to court documents, three other entities are named in the lawsuit:
-
Oxylabs (a Lithuanian data firm)
-
SerpApi (a Texas-based scraping service)
-
AWMProxy, which Reddit described as a “former Russian botnet.”
The company claims these groups disguised automated scrapers as real users and masked their identity and physical locations to avoid detection. Reddit argues that Perplexity not only benefited from this data extraction but was an active participant, using the scraped content to train its AI-driven “answer engine.”
Reddit’s Legal Stand: Defending Human-Generated Content
“AI Companies Are in a Race for Quality Human Data”
Reddit’s legal chief Ben Lee characterized the case as a defense of authentic, human-generated content — the core substance that makes AI models smarter.
“AI companies are locked in an arms race for quality human content,” he said. “This race has fuelled an industrial-scale laundering economy, where scraped data is washed and reused under the guise of machine learning.”
With millions of daily user interactions and comments, Reddit calls itself “one of the largest collections of human conversation ever created,” a massive data trove that is now central to AI training disputes.
Perplexity AI Responds: “Our Work Is Ethical and Transparent”
Perplexity has denied any wrongdoing, stating it hasn’t yet received the lawsuit but stands firm on its principles of open knowledge.
“We will always fight vigorously for users’ rights to freely and fairly access public knowledge,” Perplexity said in a public statement.
The company maintains that its AI systems rely only on publicly available and fair-use content, and that it operates with “a principled and responsible approach to data.”
Meanwhile, Oxylabs and SerpApi have also refuted Reddit’s allegations, confirming they were not formally served with legal papers. Oxylabs executive Denas Grybauskas criticized Reddit for skipping direct communication before seeking legal action. “Reddit made no attempt to speak with us directly,” he said, calling the complaint premature.
Failed Negotiations: How Talks Collapsed Between Reddit and Perplexity
According to multiple reports from The Financial Times, Reddit had confronted Perplexity AI over alleged data scraping earlier in the year. The platform even proposed a paid licensing partnership that would allow Perplexity to access Reddit data legally.
However, Aravind Srinivas, the founder and CEO of Perplexity, reportedly declined the offer. Reddit then escalated the matter by filing a formal complaint.
The lawsuit further claims that Reddit reached out to Google, asking whether Perplexity had used Google search results to indirectly collect Reddit content, bypassing the platform’s standard usage restrictions.
An Ironic Twist: Reddit Itself Sells AI Access
Ironically, Reddit has already monetized its own data — striking licensing deals with OpenAI and Google earlier this year. These billion-dollar agreements allow the two AI giants to use Reddit’s data for training large language models (LLMs) under controlled conditions.
Reddit’s complaint thus doesn’t appear to challenge data access itself, but rather unauthorized and unregulated data scraping. The company claims Perplexity “illegally obtained and reused traffic data” without transparent agreements or compensation.
A Growing War Over Data in the AI Industry
The Reddit-Perplexity case is just the latest flashpoint in what many are calling the “AI data wars.” A rising number of lawsuits have already emerged across the U.S. against AI firms accused of misusing copyrighted content.
Notable examples include:
-
The New York Times vs. OpenAI, over alleged use of journalistic archives.
-
Authors vs. Meta and OpenAI, for training models on copyrighted books.
-
Getty Images vs. Stability AI, for using copyrighted photographs.
In June, Reddit itself sued Anthropic, claiming the company scraped its website over 100,000 times in one year. Anthropic denied the allegations, arguing it stayed within acceptable fair-use boundaries.
This latest case against Perplexity underscores the growing tension surrounding who owns the internet’s collective wisdom — and who gets to profit from it.
The Broader Implications: Paying for Human Knowledge
“If AI Companies Want Our Content, They’ll Have to Pay for It”
With this lawsuit, Reddit appears to be drawing a line in the digital sand. The company argues that if AI developers want access to authentic human conversation, they should pay for legitimate licensing just as OpenAI and Google did.
The case could set a critical precedent for how data is valued, licensed, and used in the AI era. Should Reddit win, other online platforms — from YouTube to Quora — might follow suit, demanding financial compensation for any AI models that rely on user-generated content for training.
The Reddit vs. Perplexity AI lawsuit captures a pivotal moment in the evolution of artificial intelligence and digital ethics. It’s no longer about whether AI can use the internet’s knowledge — but how it does so, and under what terms.
Reddit, once a free-flowing hub of open discussions, is now defending that content as a valuable intellectual asset. As the legal battle unfolds, it could reshape the boundaries between human creativity and machine learning, defining the cost of knowledge in the age of AI.
FAQs
Q1. Why is Reddit suing Perplexity AI?
Reddit claims Perplexity illegally scraped its user-generated posts without consent to train its AI systems.
Q2. Who else is named in the lawsuit?
Reddit also listed Oxylabs, SerpApi, and AWMProxy, accusing them of providing or supporting unauthorized data scraping.
Q3. How has Perplexity responded?
Perplexity denies wrongdoing, stating it supports open access to public knowledge and operates responsibly with available data.
Q4. Didn’t Reddit already sell its data to AI firms?
Yes, Reddit has licensing deals with OpenAI and Google, but argues that Perplexity bypassed lawful access routes.
Q5. What impact could this lawsuit have on AI companies?
If Reddit wins, it may compel AI developers to pay for data access, setting a major precedent for online content licensing.