AEO & AI SEO 4 min read

AI Bot Rules

Rules in robots.txt and HTTP headers to control AI crawler behavior.

Bas Vermeer SEO/AEO Specialist

AI bot rules are instructions that allow website owners to control AI crawler behavior. This is done primarily via robots.txt — bibliotheekterm, but also via HTTP headers and meta tags. You determine which AI bots may index — bibliotheekterm, scrape, or use your content for training.

Known AI bots

The most important AI bots are: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI training), Applebot-Extended (Apple Intelligence), CCBot (Common Crawl), and Amazonbot. Each bot has its own user-agent string.

Strategic choices

Do you block AI bots entirely, or allow them selectively? Many businesses choose a middle ground: they allow crawling — bibliotheekterm for AI visibility but block training-specific bots. The right strategy depends on your goals: do you want to be cited by AI, or do you want to protect your content?

Reference table: known AI bots

User-agent	Owner	Purpose	Respects robots.txt
GPTBot	OpenAI	Web crawling for ChatGPT and AI products	Yes
OAI-SearchBot	OpenAI	ChatGPT Search (real-time search results)	Yes
ChatGPT-User	OpenAI	Fetching pages when a user shares a URL in ChatGPT	Yes
ClaudeBot	Anthropic	Training and improvement of Claude models	Yes
PerplexityBot	Perplexity AI	Real-time search results in Perplexity	Yes
Google-Extended	Google	AI training (Gemini), not for regular Google Search	Yes
Googlebot	Google	Regular search index (incl. AI Overviews)	Yes
Applebot-Extended	Apple	Apple Intelligence and Siri training	Yes
Applebot	Apple	Siri and Spotlight suggestions	Yes
CCBot	Common Crawl	Open dataset, used by many AI models for training	Yes
Amazonbot	Amazon	Alexa answers and Amazon AI products	Yes
Bytespider	ByteDance	TikTok search and AI training	Partially
FacebookBot	Meta	Content preview and AI training	Yes
Diffbot	Diffbot	Structured data — bibliotheekterm extraction for AI Knowledge Graphs	Yes
cohere-ai	Cohere	Training of Cohere's language models	Yes
anthropic-ai	Anthropic	Web research for Claude	Yes

Robots.txt templates for AI bots

Strategy 1: Allow all (maximum AI visibility)

# Allow all AI bots for maximum visibility
# in AI answers and search results

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: Amazonbot
Allow: /

Strategy 2: Selective access (balance visibility/protection)

# Allow AI search engines, block training bots
# Balance between visibility in AI answers
# and protection against unauthorized training

# Allow: bots that cite your content with source attribution
User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Amazonbot
Allow: /

# Block: bots that primarily train without citation
User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

Strategy 3: Block all (maximum content protection)

# Block all known AI bots
# Note: this significantly reduces your visibility
# in AI answers

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

What does our scanner check?

The scanner analyzes your robots.txt for AI bot rules. We check which AI crawlers you allow and which you block, and whether you have a deliberate strategy (rather than no rules at all). This is part of both the AEO — bibliotheekterm score and the Agent Readiness score.

Frequently asked questions

Should I allow or block AI bots?

This depends on your goals. If you want to be cited in AI answers (AEO), allow at minimum GPTBot, PerplexityBot, and ClaudeBot. If you want to protect your content from AI training without citation, block Google-Extended, CCBot, and Bytespider. Most businesses choose a middle ground: allow search-related bots, block training-only bots.

Do AI bots actually respect robots.txt?

Major AI companies (OpenAI, Anthropic, Google, Perplexity) respect robots.txt. This is in their own interest: websites blocking AI bots because they ignore robots.txt would harm the ecosystem. Smaller or lesser-known bots are less reliable. Robots.txt is a convention, not legal protection.

Can I block AI bots with HTTP headers instead of robots.txt?

Yes. You can use the X-Robots-Tag HTTP header with directives like "noai" or "noimageai" for specific pages. This provides more granular control than robots.txt, which only works at the path level. The meta tag <meta name="robots" content="noai"> works similarly at the page level.

What if I have no AI bot rules in my robots.txt?

If you have no specific rules for AI bots, they follow the default User-agent: * rules. If you have no restrictions there either, all bots (including AI bots) may crawl your entire site. It's wise to make a deliberate choice and document it in your robots.txt.

Do AI bot user agents change regularly?

Major AI companies document their user agents and announce changes in advance. However, new bots are regularly added as more companies launch AI products. It's advisable to review your robots.txt at least quarterly and add new AI bots to your policy.

AI Bot Rules

Known AI bots

Strategic choices

Reference table: known AI bots

Robots.txt templates for AI bots

Strategy 1: Allow all (maximum AI visibility)

Strategy 2: Selective access (balance visibility/protection)

Strategy 3: Block all (maximum content protection)

What does our scanner check?

Frequently asked questions

RELATED TERMS

Answer Engine Optimization

Crawling

Robots.txt

llms.txt

OAuth Discovery

Web Agent Protocol

RELATED SCANNER CHECKS

Test your website