AEO & AI SEO 4 min read

AI Bot Rules

Rules in robots.txt and HTTP headers to control AI crawler behavior.

Bas Vermeer
Bas Vermeer SEO/AEO Specialist

AI bot rules are instructions that allow website owners to control AI crawler behavior. This is done primarily via robots.txt — bibliotheekterm, but also via HTTP headers and meta tags. You determine which AI bots may index — bibliotheekterm, scrape, or use your content for training.

Known AI bots

The most important AI bots are: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI training), Applebot-Extended (Apple Intelligence), CCBot (Common Crawl), and Amazonbot. Each bot has its own user-agent string.

Strategic choices

Do you block AI bots entirely, or allow them selectively? Many businesses choose a middle ground: they allow crawling — bibliotheekterm for AI visibility but block training-specific bots. The right strategy depends on your goals: do you want to be cited by AI, or do you want to protect your content?

Reference table: known AI bots

User-agentOwnerPurposeRespects robots.txt
GPTBotOpenAIWeb crawling for ChatGPT and AI productsYes
OAI-SearchBotOpenAIChatGPT Search (real-time search results)Yes
ChatGPT-UserOpenAIFetching pages when a user shares a URL in ChatGPTYes
ClaudeBotAnthropicTraining and improvement of Claude modelsYes
PerplexityBotPerplexity AIReal-time search results in PerplexityYes
Google-ExtendedGoogleAI training (Gemini), not for regular Google SearchYes
GooglebotGoogleRegular search index (incl. AI Overviews)Yes
Applebot-ExtendedAppleApple Intelligence and Siri trainingYes
ApplebotAppleSiri and Spotlight suggestionsYes
CCBotCommon CrawlOpen dataset, used by many AI models for trainingYes
AmazonbotAmazonAlexa answers and Amazon AI productsYes
BytespiderByteDanceTikTok search and AI trainingPartially
FacebookBotMetaContent preview and AI trainingYes
DiffbotDiffbotStructured data — bibliotheekterm extraction for AI Knowledge GraphsYes
cohere-aiCohereTraining of Cohere's language modelsYes
anthropic-aiAnthropicWeb research for ClaudeYes

Robots.txt templates for AI bots

Strategy 1: Allow all (maximum AI visibility)

# Allow all AI bots for maximum visibility
# in AI answers and search results

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: Amazonbot
Allow: /

Strategy 2: Selective access (balance visibility/protection)

# Allow AI search engines, block training bots
# Balance between visibility in AI answers
# and protection against unauthorized training

# Allow: bots that cite your content with source attribution
User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Amazonbot
Allow: /

# Block: bots that primarily train without citation
User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

Strategy 3: Block all (maximum content protection)

# Block all known AI bots
# Note: this significantly reduces your visibility
# in AI answers

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

What does our scanner check?

The scanner analyzes your robots.txt for AI bot rules. We check which AI crawlers you allow and which you block, and whether you have a deliberate strategy (rather than no rules at all). This is part of both the AEO — bibliotheekterm score and the Agent Readiness score.

Frequently asked questions

Should I allow or block AI bots?

This depends on your goals. If you want to be cited in AI answers (AEO), allow at minimum GPTBot, PerplexityBot, and ClaudeBot. If you want to protect your content from AI training without citation, block Google-Extended, CCBot, and Bytespider. Most businesses choose a middle ground: allow search-related bots, block training-only bots.

Do AI bots actually respect robots.txt?

Major AI companies (OpenAI, Anthropic, Google, Perplexity) respect robots.txt. This is in their own interest: websites blocking AI bots because they ignore robots.txt would harm the ecosystem. Smaller or lesser-known bots are less reliable. Robots.txt is a convention, not legal protection.

Can I block AI bots with HTTP headers instead of robots.txt?

Yes. You can use the X-Robots-Tag HTTP header with directives like "noai" or "noimageai" for specific pages. This provides more granular control than robots.txt, which only works at the path level. The meta tag <meta name="robots" content="noai"> works similarly at the page level.

What if I have no AI bot rules in my robots.txt?

If you have no specific rules for AI bots, they follow the default User-agent: * rules. If you have no restrictions there either, all bots (including AI bots) may crawl your entire site. It's wise to make a deliberate choice and document it in your robots.txt.

Do AI bot user agents change regularly?

Major AI companies document their user agents and announce changes in advance. However, new bots are regularly added as more companies launch AI products. It's advisable to review your robots.txt at least quarterly and add new AI bots to your policy.

RELATED TERMS

Crawling

The automated scanning of websites by search engines and AI bots to discover content.

Reinier Sierag Reinier Sierag

RELATED SCANNER CHECKS

AI bot rules
AI bot blocking rules
Bas Vermeer
Bas Vermeer

SEO/AEO Specialist

My career started by manually combing through server log files. I wanted to understand how Googlebot crawls websites. That fascination with the technical side of discoverability? Never faded. At Koba...