Study: 95% of Perplexity.ai citations are also on the first page of Google

Summary:

I was investigating the new AI search engine, Perplexity.ai, and specifically how they source the controversial citations that they scrape and repurpose into their AI driven answers.

Given how much spam exists on the internet, I was curious how Perplexity is ordering the inputs it then uses to generate results. In traditional web search, algorithms like Google’s PageRank used signals like external links to separate quality content from spam. Without a similar ranking algorithm, Perplexity would surely be stuck sourcing the same automated bot-written content that once drowned so many of Google’s competitors.

On an analysis of 491 separate question searches, ~95% of the citations Perplexity.ai used to form it’s AI driven results appeared to also appear on Page 1 of a Google Search.

This means that either Perplexity’s ranking algorithms are incredibly similar to Google Search - an incredible feat given the decades of work and infrastructure that has gone into Google’s ranking algorithms, or, Perplexity (and/or one of their data suppliers) is simply scraping Google’s Search Results to form the basis of their AI answers.

Methodology:

In order to source a list of commonly asked questions, I took a random list of popular keywords that reddit.com appears for, which also contains words like ‘Who’, ‘What’, ‘How’. Given the source is, well, Reddit, I then did my best to filter out searches for sexual content.

I then wrote a small python script that opened Perplexity.ai, entered the question, and scraped the content of the citations box that appears at the top. If your curious as to why there’s 491 results in this study, it was originally going to be 750, but Perplexity are really good at blocking scrapers. Ironic, given the recent controversy about them themselves ignoring Robots.txt!

Finally, I used another tool that performed a similar action on Google. Since I’m located in the UK and Perplexity answers were often citing UK based sources, this scan was made on Google.co.uk.

I then compared the two, marking any result that appears in both. Of the 1714 individual pages seen across those, 1634 appeared on the first page of google.co.uk for the same query.

Data can be found in this Google Sheet.

If you are covering this, please cite the source! :) I'm Mike Curtis, a London based Digital Marketer who wrote this small free keyword research tool called Searchtoolbox.