How LLMs Work

Temperature (LLM)

Temperature is a setting that controls how random a language model's output is during generation: low values produce consistent, predictable answers, higher values produce varied, creative ones. It is a key reason the same prompt about a product category can name different brands on different runs of the same AI.

How temperature controls generation

At each step of inference, the model assigns probabilities to every possible next token. Temperature reshapes that distribution before sampling: near zero, the model almost always picks the most probable token, yielding nearly deterministic output; higher values flatten the distribution, letting less likely tokens through and increasing variety.

Consumer chat products typically run at moderate temperatures for natural-sounding conversation, which means answers, including which brands get named, vary meaningfully between runs of the identical prompt.

Temperature and brand mention variability

For recommendation queries, several brands often sit close together in probability. At non-zero temperature, sampling decides which subset appears in any given answer: you might be named in six runs out of ten. Your true visibility is that frequency, not the binary outcome of one check, the statistical root of answer volatility.

A strong brand association in training data and retrieved sources raises your baseline probability, so you survive sampling more often. Weakly associated brands flicker in and out; strongly associated ones appear consistently at any reasonable temperature.

What marketers should take from this

First, never judge AI visibility from single queries, sampling noise will mislead you in both directions. Second, aim to raise your underlying probability: dominant share of voice in source material makes you the high-probability token sequence that low and high temperatures alike select.

Measurement must match the mechanism: repeated daily sampling of the same prompts, aggregated into mention rates. Geonimo's tracking is built on exactly this principle, surfacing your stable visibility level beneath run-to-run randomness.

Frequently asked questions

What does temperature do in an LLM?

Temperature scales the randomness of token sampling during generation. Low temperature makes the model pick its most probable words, giving consistent answers; high temperature admits less likely words, giving varied, creative output. Chat products use moderate values, so identical prompts produce noticeably different answers across runs.

Is temperature why ChatGPT gives different brand recommendations each time?

It is a major reason. With several brands at similar probability, sampling at non-zero temperature selects different subsets per run. Model routing and retrieval changes add further variation. Brands with stronger underlying associations get sampled more often, which is why visibility should be measured as a frequency.

Can I control the temperature of public AI platforms?

Not in consumer products, platforms fix their own settings. API users can set temperature for their applications. For marketers, the lever is not the dial but the distribution: strengthening your brand's presence in training data and retrievable sources raises your probability of being named at any temperature.

Related terms

Inference (LLM)

Inference is the runtime process where a trained language model generates output, predicting tokens one by one in response to a prompt. Every AI answer users see is an inference run. Its cost and latency constraints explain why engines retrieve few sources, summarize aggressively, and cache answers, all of which shape brand visibility.

Answer Volatility

Answer volatility is the tendency of AI engines to give different answers to the same prompt across runs, days and models, caused by sampling temperature, model updates and changing retrieval results. It makes single spot-checks unreliable for measuring AI visibility and is the core reason repeated daily sampling is required.

Share of Voice (AI Search)

Share of voice in AI search is the percentage of brand mentions a company captures out of all brand mentions in AI-generated answers for a defined set of prompts. If AI engines produce 200 brand mentions across your tracked prompts and 50 name your brand, your AI share of voice is 25 percent.

Mention Rate

Mention rate is the percentage of AI-generated answers that name a specific brand, measured across a tracked set of prompts. If a brand is mentioned in 30 of 100 answers collected for its tracked prompts, its mention rate is 30 percent. It is the foundational metric for quantifying brand presence in AI search.

Last updated: 2026-06-11

Track this for your brand

Geonimo monitors how ChatGPT, Perplexity, Claude, Gemini and Google AI talk about your brand — and generates the content that gets you cited.

Get your free audit