How LLMs Work

Context Window

The context window is the maximum amount of text, measured in tokens, a language model can consider at once: the system prompt, conversation history, retrieved documents, and its own output. It limits how many sources an AI engine can read per answer, making the competition to be among those few sources intense.

What fits in the window

Every answer is generated from whatever occupies the context window at that moment: instructions, the user's conversation, and, in search products, the retrieved passages. Modern frontier models offer windows of hundreds of thousands of tokens, with some reaching a million or more, yet production systems deliberately fill only a fraction, because cost and latency scale with context length.

Models also attend unevenly across long contexts, information in the middle of a huge context can be effectively ignored, so engines curate small, high-relevance context rather than dumping everything in.

The visibility consequence: few seats per answer

When an AI engine answers a commercial query, the context typically holds passages from perhaps five to twenty sources, not five hundred. Your content either makes that shortlist, selected by retrieval and reranking, or contributes nothing to the answer. This is a far harsher cutoff than page-one rankings ever were.

It also means partial wins are possible: one excellent passage from your site can sit in context alongside competitors' pages, earning you a mention or citation within an answer largely built from other sources.

Designing content for context inclusion

Optimize for being the most useful few hundred tokens on the topic: direct answers, unique data, clear structure that retrieval systems chunk cleanly. Redundant pages competing with each other for the same slot help nobody; one authoritative page per intent concentrates your chances.

Because the shortlist changes per query and per day, visibility is a rate, not a state. Geonimo measures how often your content makes it into answers across engines and prompts, turning context-window competition into a trackable share-of-answer metric.

Frequently asked questions

What is a context window in simple terms?

It is the model's working memory: the maximum text, instructions, conversation, retrieved documents, it can consider while generating a response, measured in tokens. Anything outside the window does not exist for that answer. Large windows exist, but AI search engines still inject only a small curated set of sources.

How does the context window affect my brand's AI visibility?

Each AI answer is built from the handful of source passages that fit the engine's curated context, often from fewer than twenty pages. If retrieval does not select your content for that shortlist, you cannot be cited or accurately represented. Winning those few context slots is the core competition of AI search.

Do bigger context windows mean AI reads my whole website?

No. Even million-token windows are rarely filled in search products, because cost, speed, and attention quality degrade with length. Engines retrieve and inject selected chunks, typically a few hundred tokens per source. Strong individual passages matter far more than total site length.

Related terms

Token (LLM)

A token is the basic unit of text a language model processes, typically a word fragment of about four characters or three-quarters of a word in English. Models read, generate, and price everything in tokens. Token limits shape how much of a web page an AI can ingest when composing an answer.

Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation is an AI architecture where a language model retrieves relevant documents, typically via web or database search, before generating its answer, grounding the response in fetched content. RAG powers AI search engines like Perplexity and ChatGPT Search, and it is the mechanism through which web pages earn citations in AI answers.

Reranking

Reranking is a second-pass scoring step in retrieval pipelines where a specialized model re-orders initially retrieved documents by true relevance to the query before the best few are passed to the language model. It is the final filter deciding which sources an AI answer actually uses and cites.

System Prompt

A system prompt is the hidden instruction set a platform gives its language model before any user input, defining behavior, tone, safety rules, and how to use tools like web search and citations. It silently shapes every AI answer, including whether and how brands get recommended, compared, and sourced.

Last updated: 2026-06-11

Track this for your brand

Geonimo monitors how ChatGPT, Perplexity, Claude, Gemini and Google AI talk about your brand — and generates the content that gets you cited.

Get your free audit