LLM-Driven SEO: The Shift to Topical Coverage and Information Retrieval Cost

Michał Suski, co-founder and Head of Innovation at Surfer, presented a deep dive into advanced SEO strategies, focusing on adapting content for both Google and Large Language Models (LLMs). His company has processed over 100 billion tokens through OpenAI, providing a large dataset for correlation studies.

Key Metrics & Business Rationale

The drive to integrate LLMs into content strategy is financially motivated:

Customer Acquisition: 22% of customers discovered their product through ChatGPT or other AI models.
Conversion: While Google provides 30 times more traffic and 25% of conversions, the conversion rates for Google traffic and AI-driven traffic are comparable. This justifies incorporating LLM signals.
Data Scale: Correlation studies utilized a large keyword sample, although Suski noted that similar results could be achieved with 10,000 or 100,000 keywords.

SEO Ranking Factor Correlation Findings

The analysis of a million-keyword sample (de-duplicated via clustering) revealed significant changes in ranking factors:

Factor	Key Finding	Correlation Strength
Topical Coverage	The strongest factor for success; covering comprehensive facts/entities.	Highest
Domain Traffic/Authority	Remains a strong correlating factor, though slightly declining.	Strong
Keyword Density (Exact Match)	Correlates around zero; move away from obsession with exact matches.	Zero
Keyword Variations	Outperform exact keyword density in paragraphs.	Stronger than Exact
Headings (H2/H3/H4)	Variations outperform exact matches; avoid forcing keywords in every subheading.	Stronger than Exact
Content Length	Non-causal correlation; an outcome of covering enough entities/facts, not a goal.	Contextual
Bolding	Bolding keywords and variations shows a modest positive correlation (~5%).	Modest Positive
URL/Domain	Exact Match Domains (EMD) and Partial Match Domains (PMD) show strong correlation.	Strong
Page Performance	HTML size and server responsiveness are more critical than overall page size.	Strong
Schema Markup	Negative correlation with the number of different schema types (avoid excessive, abusive schema).	Negative
AI Content Detection	No correlation between AI-detected content and rankings.	None

Optimizing for LLMs: The Core Source & IRC

The second part of the analysis involved reverse-engineering LLMs by examining 50 million AI Overview answers and 400,000 model prompts.

The “Core Source” Concept

Definition: URLs that are consistently cited in AI responses to the same question. Only 10-30% of sources remain stable.
Key Differentiators for Core Sources:
- Topical Coverage: Core sources cover 42% of facts vs. 34% for volatile sources and 23% for non-cited pages. This is the most significant factor.
- Google Rank: Core sources typically rank higher (average position 8) than volatile/non-core sources (average position 20).
- Semantic Proximity: Close alignment between URL slug/title and the prompt intent significantly increases the chance of inclusion.

Information Retrieval Cost (IRC)

IRC is defined as the token cost for an LLM to extract the necessary information from content. Suski introduced this as “the new crawl budget” for AI optimization.

Goal: Provide maximum information with minimal tokens.
Model-Specific Priorities: Different LLMs prioritize content structure differently:
- ChatGPT: Values early query confirmation but largely ignores author credibility (E-E-A-T).
- Google’s AI Mode: Prioritizes early answers, progression from general to specific, and author credibility (E-E-A-T).
- Perplexity & AI Overview: Identified as most susceptible to content manipulation due to broad sensitivity to content-driven signals.

IV. Action Items

Based on the findings, Michał Suski recommends the following actionable steps:

Prioritize Topical Coverage: Focus on covering contextual facts and entities to hit ~74% coverage on top pages.
Optimize for LLMs (IRC): Create content with a low information retrieval cost by being direct and structured, minimizing engaging filler (jokes, sidetracks). Provide maximum information with minimal tokens.
Evolve Keyword Strategy: Move away from exact keyword density toward using variations in content, especially in H3/H4 subheadings.
Implement Bolding: Bold keywords and variations as a low-effort optimization with positive correlation.
Domain Strategy: Use Exact Match Domains (EMD) or Partial Match Domains (PMD) for new SEO projects.
Audit Schema: Avoid going “crazy with schema” by limiting the use of different, irrelevant schema types.
Core Source Strategy: Identify stable core sources before pursuing mentions or backlinks, as these are the most valuable citation targets.

My Take: What This Means for Solo Publishers

I attended Michał Suski’s talk at Chiang Mai SEO 2025, and this was one of the presentations that actually changed how I think about content strategy. Not because it introduced completely new concepts—topical authority has been discussed for years—but because Surfer’s data finally quantifies what many of us suspected.

Who Should Care About This

Solo publishers and small teams: This is directly applicable. You don’t need enterprise tools or a team of 20 to implement topical coverage strategies. In fact, smaller sites can pivot faster than large organizations.

Affiliate marketers: The shift from exact-match keywords to entity coverage explains why thin “best X” posts are dying. Your roundups need to comprehensively cover the product category, not just list items with affiliate links.

NOT primarily for agencies: While agencies can use this data, the real advantage goes to operators who can execute quickly without client approval cycles.

What I’m Actually Implementing

1. Topical Coverage Audits

The 74% entity coverage benchmark is now my target for pillar content. You can measure this with NLP entity extraction tools, competitor content analysis, and “People Also Ask” clustering.

For my photography sites, this means every camera roundup needs to cover: sensor specs, autofocus systems, video capabilities, ergonomics, price positioning, and competitor comparisons—not just “top 5 cameras.”

2. Information Retrieval Cost (IRC) Optimization

This is the actionable insight I’m most excited about. Michał’s concept of IRC as “the new crawl budget” for AI is brilliant. Here’s how I’m applying it:

Front-load answers: The first 100 words should directly address the query. No “In this article, we’ll explore…” preambles.
Structure for extraction: Use clear H2/H3 hierarchy that mirrors how someone would ask follow-up questions.
Kill the filler: Those “engaging” introductions and jokes? They increase token cost for LLMs without adding information value.

For a deeper technical framework on structuring content for LLM extraction, see The Technical Framework for LLM Content Optimization.

3. The EMD/PMD Signal

The strong correlation for Exact Match and Partial Match Domains is interesting but requires context. This works for new projects—I wouldn’t recommend rebranding an established site just for this signal. But for future microsites or programmatic plays, domain naming matters more than current wisdom suggests.

What I’m Ignoring

Schema proliferation: Michał’s finding that excessive schema types correlate negatively with rankings validates my minimalist approach. I use Article, Product, and FAQ schema where genuinely applicable—not every schema type just because a plugin suggests it.

AI content detection concerns: The zero correlation between AI detection and rankings is reassuring but not surprising. Google has said they care about quality, not provenance. I still blend AI drafts with human expertise, but I’m not paranoid about detection tools.

The Bigger Picture

What Suski is really describing is the transition from “ranking documents” to “ranking information.” Google’s AI systems (and ChatGPT, Perplexity, etc.) don’t care about your beautiful website design or clever copywriting. They’re extracting facts and entities to synthesize answers.

This is good news for publishers who focus on genuine expertise. It’s bad news for those who’ve built businesses on keyword-stuffed content with minimal actual information.

To audit and measure E-E-A-T signals systematically, Tom Winter’s methodology provides a practical LLM-based approach. And for building the topical architecture that supports comprehensive coverage, see Koray Tugberk Gubur’s topical map strategies.