How to Get Cited by ChatGPT & Perplexity: A GEO Playbook

The definition of GEO is the easy part. The hard part is the question a solo publisher actually asks at the end of it: fine — so what do I change on the page? This is the answer to that question. Not the theory of why citations matter, but the working playbook for earning one — written from the seat of someone who runs the site, not someone selling a course about it.

Two things up front. First, ChatGPT and Perplexity are not the same machine wearing different paint. They retrieve differently, cite differently, and reward slightly different work — so a few moves below are platform-specific. Second, there are no secret files and no magic markup here. Anyone promising a llms.txt cheat code is selling the part of GEO that didn’t survive contact with Google’s own documentation. The real levers are unglamorous and they overlap heavily with good SEO. That’s the point.

How each engine actually picks a source

You can’t optimize for a box you can’t see into, so here’s the mechanism for each, in plain terms.

Perplexity is retrieval-first. Almost every answer is grounded in a live search. It runs your prompt, fans it into sub-queries, pulls a set of pages with its own crawler (PerplexityBot) and partner indexes, and synthesizes an answer with numbered citations sitting right next to the claims. Because it leans so hard on live retrieval, a freshly published, cleanly structured page can show up in a Perplexity answer within days. It is the more winnable of the two for a small site — and the one where citations are most visible, so it’s also the easiest to measure.

ChatGPT is memory-first, retrieval-second. For a lot of prompts it answers straight from training weights and never searches at all — which means being in the training data (old, widely-linked, frequently-discussed) carries weight no on-page tweak can buy this quarter. When it does browse (SearchGPT / the browsing tool), it leans on a Bing-derived index and cites fewer sources than Perplexity — often one or two. The bar to be that one source is higher, and classical search eligibility (Bing included, not just Google) matters more than people expect.

The shared mechanic underneath both is passage-level retrieval-augmented generation: the engine doesn’t grade your whole page, it lifts the specific section that answers the sub-query and grounds on that. Everything below follows from that one fact. The deeper version of this mechanism is in LLM-driven SEO; here we’re staying operational.

The seven moves

A seven-step staircase from retrieved to cited: eligibility, passage focus, unique content, clean entity, off-site mentions, extraction structure, and citation measurement. — The seven moves as a climb from *retrieved* to *cited* — start with crawler eligibility at the bottom and work up through passage-level answers, originality a model can't synthesize, clean entity resolution, off-site mentions, extractable structure, and finally measuring the citation itself. Each step is necessary; skipping the first silently kills the rest.

1. Be eligible — confirm the AI crawlers can actually read you

This is the step everyone skips and the one that silently kills the rest. If the bots can’t fetch and render your page, nothing else on this list matters.

Don’t block the retrieval crawlers in robots.txt. The ones that matter for citation (as opposed to training) are OAI-SearchBot and ChatGPT-User for ChatGPT, and PerplexityBot for Perplexity. Blocking GPTBot only opts you out of training — fine if that’s your call — but blocking the search agents opts you out of being cited at all.
Serve the answer in the initial HTML. If your content only appears after client-side JavaScript runs, a retriever may grab an empty shell. Static or server-rendered wins here — which, not coincidentally, is why this site is built the way it is.
Stay snippet-eligible in classical search. Per Google’s own guidance, the AI surfaces run on the same core index — if a page can’t be surfaced normally, no answer engine reaches it either.

2. Win the passage, not the page

Front-load the answer. The single highest-leverage habit in this whole playbook is putting a direct, quotable sentence at the top of each section, before the throat-clearing. The model lifts the unit that resolves the sub-query cleanly; a conclusion buried in paragraph four costs you the citation even when your paragraph four is the best on the internet.

Make each H2 self-contained — one question, fully answered, no “as we discussed above.” A retriever pulls the block out of context, so the block has to survive out of context. This is the entire thesis of the context-density framework, and it’s the difference between being retrievable and being retrieved.

3. Be the thing the model can’t synthesize without you

If your page recombines the top ten results, the engine recombines them itself and skips your link — it doesn’t need a middleman for its own training data. What earns a citation is what the model can’t generate on its own: primary data, a first-hand test, a number nobody else published, a genuinely held opinion. The audit I ran on this site was cited by Perplexity precisely because it had original numbers attached to it. “The AI doesn’t need to retrieve what it can already produce” is the whole game in one line.

4. Resolve as a clean entity

The engine has to know who you are and what this page is, with zero ambiguity, before it will trust you enough to name you. That means consistent authorship across the site, a real About page the model can anchor a Person entity to, and sameAs links tying your identity to the places it already trusts. This is the work that moves a source from retrieved to trusted — the EEAT signals, measured the way Tom Winter audits them with LLMs, are entity-resolution signals as much as quality ones.

5. Earn the off-site mentions the retriever reads

Here’s the uncomfortable one for a content purist: a meaningful share of what answer engines cite for “best X” / “is Y any good” queries is Reddit, forums, and discussion threads, because that’s where the model finds corroborating human consensus. You don’t control those pages, but you influence whether your name appears in them. Being discussed where the model looks is now part of the surface area — the entity-everywhere logic that the local-SEO mention-wheel and the diversification playbook both circle. Borrowed authority still buys retrieval even when it doesn’t buy a byline; just don’t mistake the impression for a click.

6. Structure for extraction — without the snake oil

Use the schema you’d use anyway — Article, Person, FAQPage where a real Q&A exists, BreadcrumbList. It clarifies entities and gives the retriever clean structure to parse. What to skip: speculative “AI-only” markup, bought brand mentions, and any vendor telling you there’s a special file that ranks you in LLMs. Those were the easy-to-sell parts of GEO and the parts that Google explicitly says you don’t need. Clean, boring, standard structured data — yes. Magic — no.

7. Measure the citation, not a proxy

The only ground truth is asking the engine. Once a week, run your target queries through ChatGPT (with browsing on) and Perplexity and log a simple yes/no: did it name me, and for which query? Free GEO scoring tools are useful for surfacing signal gaps, but they grade correlates of citability, not citations themselves — mine scored this site 59/100 while Perplexity was already citing it for three of four queries. Trust the actual answer over the score. (Standing up that weekly citation log is exactly the measurement loop this site is building toward at the end of the current sprint.)

The platform cheat-sheet

If you only remember one thing per engine:

For Perplexity — publish clean and publish often. It’s retrieval-hungry and shows its citations, so a well-structured new page can earn a slot fast, and you can see it happen. Best near-term ROI for a small site. Front-loaded answers + original data + crawlable HTML is the whole recipe.
For ChatGPT — play the long game on being known. It answers from memory more than it searches, so durable wins come from being widely discussed and well-linked over time, plus staying eligible in Bing’s index for the moments it does browse. You won’t game your way into ChatGPT’s weights this month; you earn your way in over quarters.

The honest bottom line

None of this is a parallel universe to SEO. It’s the same craft with the win condition moved one notch: from rank #1 to get cited. The bar that crept up is real — sections that stand alone, entities that resolve, answers worth quoting, originality a model can’t fake — but it’s a content-quality bar, not a secret. That’s the through-line of the whole AI Search · GEO channel, and it’s why the name on the door still fits even as the meaning of ranking quietly changes underneath it.

This playbook reflects how ChatGPT and Perplexity behave as of mid-2026; both are moving targets. Start with the GEO definition pillar, then the audit of this site for what the moves above looked like in practice.

Frequently asked questions

Which crawlers do I need to allow to get cited by ChatGPT and Perplexity?

The ones that matter for citation are OAI-SearchBot and ChatGPT-User for ChatGPT, and PerplexityBot for Perplexity — don’t block those in robots.txt. Blocking GPTBot only opts you out of training, which is a separate call; blocking the search agents opts you out of being cited at all. If the bots can’t fetch and render your page, nothing else matters.

Is ChatGPT or Perplexity easier to get cited by for a small site?

Perplexity is the more winnable of the two. It’s retrieval-first, grounding almost every answer in a live search, so a freshly published, cleanly structured page can show up within days — and because its numbered citations are visible, it’s also the easiest to measure. ChatGPT is memory-first and answers from training weights more than it searches, so durable wins there come over quarters rather than this month.

Why does front-loading the answer matter for AI citations?

Because both engines use passage-level retrieval-augmented generation — they lift the specific section that answers the sub-query rather than grading your whole page. A direct, quotable sentence at the top of each section, inside a self-contained H2, survives being pulled out of context. A conclusion buried in paragraph four costs you the citation even when that paragraph is the best on the internet.

How do I know if an AI engine is actually citing my page?

Ask the engine directly. Once a week, run your target queries through ChatGPT with browsing on and through Perplexity, and log a simple yes/no on whether it named you and for which query. GEO scoring tools grade correlates of citability rather than citations themselves — the audit of this site scored 59/100 while Perplexity was already citing it for three of four queries — so trust the actual answer over the score.