AI Crawlers & Technical

llms.txt

llms.txt is a proposed convention where a site publishes a markdown file at its root giving large language models a curated map of its most important content. The goal is helping AI systems find canonical pages efficiently. Adoption is growing, but it is not an official standard and major AI engines have not committed to honoring it.

What the proposal actually is

The llms.txt proposal places a markdown file at yoursite.com/llms.txt containing a structured summary of the site: what it is, plus curated links to key documentation, product pages and guides, optionally with companion markdown versions of important pages. The idea is that LLMs working within a limited context window get a clean, token-efficient map instead of parsing navigation-heavy HTML. Think of it as a sitemap for reasoning engines rather than crawlers: prioritized, annotated and human-readable.

The honest status: promising, unproven

As of mid-2026, llms.txt remains a community convention, not a ratified standard. Thousands of sites, particularly developer tool companies, have published one, and some AI coding assistants and agents read them. But OpenAI, Anthropic, Google and Perplexity have not committed to honoring llms.txt in their crawlers or search products, and Google representatives have publicly downplayed it. Treat claimed visibility gains skeptically: there is no verified evidence that publishing llms.txt improves citations in major AI engines today. It does not replace robots.txt, which remains the binding access-control mechanism.

Should you publish one?

The cost is an hour of curation; the downside is essentially zero. For documentation-heavy sites, llms.txt has immediate utility for AI agents and developer tools that already consume it, and it positions you well if major engines adopt the convention. Keep it short, link only canonical resources, and update it when key pages change. Just sequence it honestly in your GEO strategy: crawlability, server-rendered content and structured data are proven levers; llms.txt is a cheap, speculative complement, not a substitute.

Frequently asked questions

Do ChatGPT, Claude or Perplexity actually read llms.txt?

There is no public commitment from OpenAI, Anthropic, Google or Perplexity to honor llms.txt in their search or crawling systems. Some AI agents, coding assistants and smaller tools do consume it. Publish it as a low-cost hedge, but do not expect measurable citation gains from it alone today.

Is llms.txt a replacement for robots.txt or sitemap.xml?

No. robots.txt controls crawler access and remains the binding mechanism bots check. Sitemaps enumerate URLs for indexing. llms.txt serves a different purpose: a curated, annotated content guide optimized for LLM consumption. All three can coexist; only the first two have established, enforced roles.

What should I include in an llms.txt file?

A one-line description of your site, a short summary section, then grouped markdown links to your most authoritative pages: documentation, product overviews, pricing and key guides, each with a brief annotation. Keep it concise and canonical. Quality of curation matters more than completeness, since the file's value is signal density.

Related terms

robots.txt

robots.txt is a plain-text file at a website's root that tells crawlers which parts of the site they may access, using User-agent and Disallow/Allow directives. Originally built for search engines, it is now the primary mechanism for controlling AI crawlers like GPTBot and PerplexityBot. Compliance is voluntary but honored by major operators.

AI Crawler

An AI crawler is an automated bot operated by an AI company that fetches web pages to collect training data for language models or to retrieve fresh content for AI search answers. Examples include GPTBot, ClaudeBot and PerplexityBot. Each identifies itself with a user agent string and can be allowed or blocked via robots.txt.

Context Window

The context window is the maximum amount of text, measured in tokens, a language model can consider at once: the system prompt, conversation history, retrieved documents, and its own output. It limits how many sources an AI engine can read per answer, making the competition to be among those few sources intense.

GEO Strategy

A GEO strategy is a structured plan for growing a brand's visibility in AI-generated answers. It defines target prompts and engines, establishes measurement baselines, prioritizes content and authority initiatives to win specific answer placements, and sets a continuous loop of monitoring, diagnosis, and optimization across platforms like ChatGPT, Perplexity, and Gemini.

Last updated: 2026-06-11

Track this for your brand

Geonimo monitors how ChatGPT, Perplexity, Claude, Gemini and Google AI talk about your brand — and generates the content that gets you cited.

Get your free audit