AI Crawlers & Technical

Crawl Budget

Crawl budget is the number of pages a crawler will fetch from a site within a given period, shaped by the site's server capacity and the crawler's assessment of its value. Originally a Google Search concept, it now extends to AI crawlers, which typically fetch fewer pages and prioritize fresh, authoritative content.

The classic concept, extended to AI

Search engines ration crawling: crawl rate is limited by how fast your server responds without strain, and crawl demand reflects how valuable and frequently updated your pages appear. AI crawlers inherit the same economics with tighter constraints. Training crawlers sample rather than exhaustively mirror sites, and retrieval crawlers like OAI-SearchBot prioritize pages likely to answer user questions. If your most citable content sits deep in a bloated site, AI crawlers may exhaust their visit on pagination, parameters and duplicates before reaching it.

What wastes crawl budget

The usual suspects: infinite URL spaces from faceted navigation and tracking parameters, duplicate content without canonicals, soft 404s, redirect chains, and slow server responses that throttle crawl rate. For AI crawlers, JavaScript rendering adds a brutal twist: most do not execute JavaScript at all, so every fetch of a client-rendered page is a wasted request that returns an empty shell. Sites with thousands of thin pages, common in careless programmatic SEO, dilute the probability that any given crawl reaches the pages that actually earn citations.

Optimizing crawl budget for AI visibility

Concentrate crawler attention on your best answers. Flatten architecture with strong internal linking so key pages sit within a couple of clicks of the homepage, prune or noindex thin pages, fix redirect chains, serve fast server-rendered HTML, and keep sitemaps clean and current. Then measure where AI crawlers actually spend their requests: page-level bot logs reveal whether GPTBot and PerplexityBot hit your money pages or burn budget on archives. Geonimo's AI traffic analytics breaks down crawl activity per bot and per page, exposing exactly this distribution.

Frequently asked questions

Do AI crawlers have a crawl budget like Googlebot?

They behave as if they do, fetching a limited number of pages per site based on perceived value, freshness and server responsiveness, though none publish formal budget documentation. In practice AI crawlers fetch far fewer pages than Googlebot, which makes concentrating their attention on your best content even more important.

How do I see what AI crawlers spend their budget on?

Analyze server or CDN logs filtered by AI user agents, grouped by URL. Look for budget waste: parameter URLs, paginated archives, redirect chains. If your highest-value pages receive few AI bot hits while junk URLs receive many, restructure internal links and prune crawl traps to redirect that attention.

Does site speed affect AI crawl budget?

Yes. Crawlers throttle their request rate to avoid straining slow servers, so faster response times allow more pages fetched per visit. Fast, server-rendered HTML benefits AI crawlers doubly, since most cannot execute JavaScript and rely entirely on the initial HTML response being complete and quick.

Related terms

AI Crawler

An AI crawler is an automated bot operated by an AI company that fetches web pages to collect training data for language models or to retrieve fresh content for AI search answers. Examples include GPTBot, ClaudeBot and PerplexityBot. Each identifies itself with a user agent string and can be allowed or blocked via robots.txt.

JavaScript Rendering

JavaScript rendering refers to content being generated in the browser by JavaScript after the initial HTML loads, as in single-page applications. Most AI crawlers do not execute JavaScript, so client-side-rendered content is invisible to them. Server-side rendering or pre-rendering is essential for AI search visibility.

Internal Linking

Internal linking is the practice of linking pages within the same website to each other, using descriptive anchor text. It distributes authority across pages, establishes topical relationships, guides users through related content, and helps both search crawlers and AI crawlers discover, contextualize, and correctly interpret a site's structure.

robots.txt

robots.txt is a plain-text file at a website's root that tells crawlers which parts of the site they may access, using User-agent and Disallow/Allow directives. Originally built for search engines, it is now the primary mechanism for controlling AI crawlers like GPTBot and PerplexityBot. Compliance is voluntary but honored by major operators.

Last updated: 2026-06-11

Track this for your brand

Geonimo monitors how ChatGPT, Perplexity, Claude, Gemini and Google AI talk about your brand — and generates the content that gets you cited.

Get your free audit