Skip to content

The fight between AI companies and the websites that hate them

A lawsuit by online message board Reddit alleges a company hijacked its data and gives a glimpse at the knockdown boxing match behind chatbot conversations.

The Reddit app icon is seen on a smartphone. Reddit's lawsuit alleges that the start-up Perplexity benefited from improperly using its website as AI fuel.
The Reddit app icon is seen on a smartphone. Reddit's lawsuit alleges that the start-up Perplexity benefited from improperly using its website as AI fuel.Read moreMatt Slocum / AP

A lawsuit by online message board Reddit gives you a glimpse at the knockdown boxing match behind chatbot conversations.

In one corner are artificial intelligence services that gobble information from across the internet to help you plan a vacation or create silly videos. In the other corner are companies that are sometimes unwilling or overwhelmed sources of that data.

In its lawsuit, similar to ones against AI companies by news organizations, Hollywood studios, book authors, and others, Reddit alleges that the start-up Perplexity benefited from improperly using its website as AI fuel.

The claims are an example of warnings from Reddit, Wikipedia, and others that say if the boxing match continues as is, AI services may kill the websites and other source material that we love.

Dating back at least to the death of Napster a quarter-century ago, there have been constant fights over technology upstarts that remix media and information or deliver it in new ways. AI could be the most intractable fight of all.

AI ‘bank robbers’ vs. Reddit

The 20 years of our Reddit debates about the best Welsh restaurants and quiet air conditioners are gold for AI services. They typically need truckloads of online information like that to “train” their computers and serve up responses to your AI queries.

Reddit knows how valuable it is and laid out ground rules for AI companies that wanted to profit from siphoning Reddit message boards in bulk: AI companies needed a paid contract with Reddit and to respect its guardrails.

Some companies, including Google and ChatGPT parent company OpenAI, agreed to Reddit’s terms. For AI companies that didn’t agree, Reddit put up digital walls to block AI companies’ spiderlike software that crawls over websites to harvest their information.

According to Reddit, Perplexity’s CEO promised Reddit’s top lawyer more than a year ago to respect Reddit’s digital walls. Perplexity, which makes what it calls an AI “answer” engine and an AI-specialized web browser, instead found another way to siphon Reddit pages, the company says.

(The Washington Post has partnerships with Perplexity and OpenAI.)

Reddit’s lawsuit, filed Wednesday in a New York federal court, said that Perplexity hired at least one data-siphoning middleman to grab many billions of pages of Reddit material indirectly, from Google search results.

Those middlemen allegedly used technically sophisticated tactics to get around Google’s digital defenses against unwanted siphoning by bots. Reddit said that it obtained this information from a subpoena to Google in a different, secret lawsuit.

Reddit’s lawsuit compared what Perplexity and the bot-for-hire middlemen did to “bank robbers” who know they can’t get into the bank vault and “break into the armored truck carrying the cash instead.”

In a post on Reddit, Perplexity said that Reddit is after money. The lawsuit is a “sad example of what happens when public data becomes a big part of a public company’s business model,” Perplexity said.

Google said that it has “strong technical measures to prevent this type of malicious abuse, because it undermines the choices websites make about who can access their content.”

What this means for you

Experts have said that the law generally protects technology companies that take copyrighted materials like news articles, books, and movies and put them to a new, creative use. Many AI companies say that their products meet that legal standard.

Blake Reid, an associate professor at the University of Colorado Law School, said that Reddit’s case adds an extra wrinkle: The company doesn’t hold the copyright to Reddit posts. The people who created those posts do. Reid said that helps make the lawsuit’s outcome unpredictable.

Regardless, AI keeps running into a paradox: To be useful, new forms of AI rely on ingesting vast swaths of the past, present, and future internet. But doing so can increase costs and divert users from websites, which imperils the internet we use.

We’ve heard similar complaints before. Entertainment companies sued YouTube for giving you free access to their creations. Music companies have howled over TikTok letting you create dance videos to Taylor Swift tunes. News organizations have groused that Google and Facebook let you browse the news without buying newspapers or visiting news websites.

The content companies have typically found ways to grudgingly live with, and even profit from, the technology upstarts. AI is different, said Toshit Panigrahi, CEO of TollBit, which helps websites get paid for AI data collection.

AI services grab information at warp speed and at industrial scale from so many places, including news and entertainment sites, cruise operators, and furniture sellers. Panigrahi said that the old pattern — technology changes are good for us and the owners of digital creations — may no longer apply.

“This is changing how the internet works fundamentally,” he said.