Blog
Original Research

We Analyzed 2.1M Sources Cited by AI Search Engines

April 3, 2026/14 min read/
Guillaume RufenachtGuillaume Rufenacht
2.1M
Sources Analyzed
567K
Unique Domains
1,280
Projects Tracked
5 months
Nov 2025 - Apr 2026

Methodology

Geonimo monitors AI search visibility for brands by sending prompts to ChatGPT, Perplexity, Claude, and Gemini daily. Every response is parsed to extract the sources these models cite: the URLs, domains, page titles, and snippets they reference when answering user questions.

This study covers 2.1 million source citations extracted from AI responses between November 2025 and April 2026, across 1,280 active projects spanning industries from e-commerce and SaaS to finance, health, and real estate. Every data point is a real source that an AI model chose to cite in response to a real-world prompt.

Finding #1: Corporate Pages Dominate AI Citations (60.6%)

The single largest category of sources cited by AI search engines is Corporate content — company websites, product pages, SaaS documentation, and brand-owned content. It accounts for 60.6% of all citations.

Source Type Distribution (2.1M sources)

Corporate1,278,980 (60.6%)
Editorial295,550 (14.0%)
UGC (User-Generated Content)271,560 (12.9%)
Institutional203,970 (9.7%)
Reference57,690 (2.7%)

This is the most important finding in the dataset. AI search engines overwhelmingly cite brand-owned content. If your company website has well-structured, authoritative pages about what you do, AI models will surface them. This isn't a game dominated by media outlets or Wikipedia — it's dominated by the companies themselves.

Editorial content (news sites, tech publications, industry blogs) comes second at 14%, followed closely by user-generated content at 12.9%. The UGC number is driven almost entirely by Reddit and YouTube, which we'll cover next.

Finding #2: Reddit Is the #1 Cited Domain by a Massive Margin

Across all 2.1 million sources, Reddit is cited more than any other single domain — by a factor of 3x over the runner-up.

Top 15 Most Cited Domains

reddit.com
180,460
youtube.com
59,880
en.wikipedia.org
49,930
fr.wikipedia.org
20,790
arxiv.org
12,330
linkedin.com
11,170
shopify.com
9,450
quora.com
6,560
lemonde.fr
5,780
facebook.com
4,690
medium.com
4,550
forbes.com
4,280
runrepeat.com
4,140
indeed.com
3,960
techradar.com
3,340

Reddit's dominance isn't surprising if you've been paying attention to SEO trends. Google has been pushing Reddit in organic results for over a year. Now AI models are doing the same thing — but for a different reason. AI values Reddit because discussions contain real user opinions, product comparisons, and experience-based answers that corporate pages often lack.

YouTube's second-place finish is significant. AI models cite video content even though they can't watch the videos — they rely on titles, descriptions, and transcripts. If your brand produces YouTube content, that content is being surfaced in AI responses.

Wikipedia's third place confirms what we've long suspected: AI models use Wikipedia as a trust anchor. If your brand or topic has a Wikipedia page, AI models are more likely to include it as a foundational reference.

Notice what's NOT at the top: major news sites. Forbes is at #12 with 4,280 citations. The Financial Times, TechRadar, The Guardian — they're all in the dataset but nowhere near the top. AI models prefer depth over brand when it comes to media.

Finding #3: The Long Tail Is Massive — 73.5% of Citations Come from Outside the Top 100

This might be the most actionable finding in the entire study.

Domain Concentration

Top 10 domains17.4% of citations
Top 50 domains23.1% of citations
Top 100 domains26.5% of citations
The other 566,900 domains73.5% of citations

63% of domains appear only once in the entire dataset. AI models actively seek niche, specific sources.

Unlike Google search, where page 1 is dominated by a handful of authority domains, AI search has a radically different distribution. Nearly three-quarters of all citations come from domains outside the top 100. And 63% of the 567,000 unique domains in our dataset appeared only once.

What does this mean? AI models are doing deep, specific research. They don't default to citing the same 50 domains for every query. They look for the best answer to each specific question, and often that means finding a niche blog post, a technical documentation page, or a specific product page that directly answers the prompt.

This is fundamentally good news for smaller brands. You don't need to out-authority Forbes or Wikipedia. You need to be the best answer to specific questions in your niche.

Finding #4: Blog Content Is the #1 Cited Format (15.4%)

When we analyzed URL patterns to classify content formats, blog and article content emerged as the single most identifiable format at 15.4% of all citations:

Content Format Distribution

Corporate / Product Pages
1,177,90055.8%
Blog / Article
325,88015.4%
Forum / Discussion
172,7908.2%
Product / E-commerce
138,5206.6%
Wiki / Reference
79,5103.8%
News
65,8803.1%
Video
65,1203.1%
Documentation / Guide
59,3502.8%
Research / Academic
15,3400.7%
Review / Comparison
9,3100.4%

The key insight: AI models love content that directly answers questions. Blog posts, guides, and documentation are structured to explain things. Corporate product pages are cited most in total volume because they describe what a company does. But blog content is cited at a disproportionately high rate relative to how much of it exists on the web.

Forums (8.2%) confirm the Reddit finding. E-commerce product pages (6.6%) show that AI models cite specific products when users ask buying questions. And the 2.8% documentation/guides figure suggests that technical documentation is underrated as a GEO asset.

Finding #5: Citation Volume Grew 6x in 5 Months

AI search is not a static channel. The volume of citations in our dataset grew dramatically month over month:

Monthly Citation Volume (sources extracted)

Nov 2025
123,410
Dec 2025
139,280
Jan 2026
443,900
Feb 2026
598,230
Mar 2026
757,520

April 2026 data is partial (first 3 days only).

From 123K sources in November 2025 to 757K in March 2026 — a 6.1x increase in just five months. This growth reflects both increasing adoption of our platform and the general expansion of AI search as a discovery channel.

The jump from December to January is particularly notable — a 3.2x increase in a single month. This coincides with the period when multiple AI providers released major model updates and expanded their web search capabilities.

The takeaway: AI search is growing exponentially as a traffic and visibility channel. Brands that aren't monitoring their AI citations today are blind to a channel that's scaling faster than any other discovery mechanism.

Finding #6: UGC's Share of AI Citations Is Rising

When we look at how each source type's share of total citations shifted over the study period, one trend stands out: UGC went from under 5% to over 20% of all citations.

Source Type Share Over Time

Corporate62.3%52.5%-9.8pp
UGC4.9%20.6%+15.7pp
Institutional14.2%12.0%-2.2pp
Editorial15.9%10.6%-5.3pp
Reference2.7%4.3%+1.6pp

Share of total citations per month. Nov 2025 vs Mar 2026.

UGC's share quadrupled from 4.9% to 20.6%, while Corporate content's share dropped from 62.3% to 52.5%. This shift is driven almost entirely by Reddit and YouTube — platforms where AI models find real user opinions, product comparisons, and experience-based answers that corporate pages typically lack.

For brands, this means your community presence is part of your AI visibility footprint. What people say about you on Reddit, forums, and review sites directly influences whether AI models recommend you.

Finding #7: YouTube Is a Universal Authority Signal

YouTube appeared across 120 out of 1,280 projects — the widest project spread of any domain. For comparison, Reddit appeared in 114 projects and Wikipedia in 115. But YouTube's spread is unique because it crosses every industry vertical.

Most Universal Domains (by number of projects they appear in)

youtube.comVideo content across all industries120 projects
en.wikipedia.orgFoundational knowledge/entity definitions115 projects
reddit.comCommunity opinions and comparisons114 projects
fr.wikipedia.orgFrench-language Wikipedia68 projects
linkedin.comProfessional profiles and company pages63 projects
es.wikipedia.orgSpanish-language Wikipedia55 projects
apps.apple.comApp Store listings53 projects
wired.comTechnology journalism51 projects
forbes.comBusiness journalism48 projects
medium.comLong-form blog content47 projects

The "universality" metric matters because it shows which domains AI models trust regardless of topic. YouTube, Wikipedia, and Reddit are essentially default trust anchors for AI search. If you have a presence on these platforms, you increase your odds of appearing in AI responses across any industry.

Finding #8: .com Dominates, but Country TLDs Reveal AI's Geographic Reach

As expected, .com domains account for the majority of citations (62%). But the geographic distribution from country-code TLDs is revealing:

Top Country TLDs (non-.com)

.fr (France)235,530
.uk (United Kingdom)55,360
.pt (Portugal)30,170
.pl (Poland)21,520
.de (Germany)21,020
.au (Australia)20,830
.br (Brazil)16,410
.ca (Canada)11,730
.in (India)10,660
.es (Spain)9,160

France's outsized representation (.fr at 235K citations) reflects the geographic composition of our user base, but it also reveals something important: AI models cite local-language content when the prompt is in that language. French prompts pull French sources. Portuguese prompts pull Portuguese sources.

This has major implications for international brands. Your AI visibility is language-specific. Having a great English website doesn't help when a French user asks ChatGPT about your product in French. You need localized content to appear in localized AI responses.

Finding #9: AI Strongly Prefers Recent Content

Among sources where we could extract a publication date, the recency bias is overwhelming:

Source Publication Year Distribution

2026
64,760
2025
160,930
2024
48,820
2023
20,710
2022
13,910
2021
7,860
2020
5,710
2019 and earlier
12,290

Note: 80% of sources had no extractable publication date and are excluded from this chart.

2025 and 2026 content together account for 67% of all datable sources. Content from 2024 makes up 15%, and everything older than that is in the single digits. The message is clear: freshness is a major ranking signal for AI citation.

This doesn't mean old content is worthless — Wikipedia articles from 2008 still get cited. But for competitive queries, AI models strongly prefer content published or updated within the last 12-18 months.

What This Means for Your Brand

Based on 2.1 million data points, here are the actionable takeaways:

1. Your website IS your AI visibility strategy.

Corporate content is 60.6% of all citations. Invest in clear, authoritative, question-answering pages on your own domain.

2. Be the best answer, not the biggest brand.

73.5% of citations come from the long tail. AI doesn't care about your domain authority — it cares about whether you answer the specific question.

3. Monitor Reddit and community platforms.

UGC's share of AI citations quadrupled from 5% to 20%. What people say about you on Reddit is now part of your AI visibility footprint.

4. Publish regularly and update existing content.

67% of datable sources are from the last 18 months. Freshness is a major signal.

5. Go multilingual if you serve international markets.

AI search is language-specific. French prompts cite French sources. You need content in every language you want visibility in.

6. YouTube and Wikipedia are universal trust anchors.

These platforms appear across nearly every project regardless of industry. Having a presence on them amplifies your AI citations.

How We Track This

Geonimo is the platform behind this research. We monitor AI search visibility daily for brands by querying ChatGPT, Perplexity, Claude, and Gemini with real-world prompts, then extracting and analyzing every source they cite.

If you want to see where your brand appears (and where it doesn't) across AI search engines, you can start a free trial. Setup takes 2 minutes.

Methodology note: All data comes from Geonimo's production analytics pipeline. Sources are extracted from AI model responses using automated NLP processing. Source types (Corporate, Editorial, UGC, Institutional, Reference) are classified using domain-level heuristics. Content format classification uses URL pattern analysis. All numbers in this article have been rounded. The dataset covers November 2025 through April 3, 2026.

Share this

Summarize with AI

ChatGPTGooglePerplexityClaude
Guillaume Rufenacht

Guillaume Rufenacht

CEO at Geonimo

Guillaume Rufenacht is the CEO and founder of Geonimo, the AI search visibility platform. He writes about GEO strategy, AI search trends, and how brands can optimize their presence across ChatGPT, Perplexity, Claude, and Google AI.

Start tracking your AI visibility

See where your brand appears in ChatGPT, Perplexity, Claude, and Google AI. 7-day free trial included.