Meta robots tags: fine-grained control over AI bots
What are meta robots tags and why do they matter?
Meta robots tags are HTML instructions that you place in the head section of a page to tell crawlers how to treat the page. Where robots.txt works at the site level and dictates which URL paths a crawler may visit, meta robots tags work at the page level and provide instructions about what a crawler may do with the content after visiting.
This distinction is fundamental. A robots.txt block prevents a crawler from visiting the page, but if the page is discovered anyway (for example via a link on another site), the crawler can still index it based on the link text. A meta robots noindex instruction, on the other hand, explicitly tells the crawler not to include the page in its index, regardless of how it found the page. For complete control over your AI visibility you need both instruments. Read our article on robots.txt for AI for the site-wide perspective.
Robots.txt blocks access to pages. Meta robots tags control what crawlers do after visiting your page. Use both together for complete control over your AI visibility.
The standard meta robots directives
The most commonly used meta robots directives have been part of the web for decades. These directives are respected by all major search engines and most AI crawlers.
<head>\n <!-- Default: index and follow links (default behavior) -->\n <meta name="robots" content="index, follow" />\n\n <!-- Do not index, but follow links -->\n <meta name="robots" content="noindex, follow" />\n\n <!-- Index, but do not follow links -->\n <meta name="robots" content="index, nofollow" />\n\n <!-- Do not index and do not follow links -->\n <meta name="robots" content="noindex, nofollow" />\n\n <!-- Do not show snippet in search results -->\n <meta name="robots" content="nosnippet" />\n\n <!-- Do not show image in search results -->\n <meta name="robots" content="noimageindex" />\n\n <!-- Do not offer cached version -->\n <meta name="robots" content="noarchive" />\n</head>Explanation per directive
- index / noindex: determines whether the page may be included in the crawler's index. noindex is the strongest signal to keep a page out of AI results.
- follow / nofollow: determines whether the crawler may follow links on the page. nofollow prevents the crawler from discovering other pages through your page.
- nosnippet: prevents a search engine or AI model from showing a text fragment of your page. Useful if you do not want your content to be quoted.
- noimageindex: prevents images on the page from being indexed. Has limited support among AI crawlers.
- noarchive: prevents a cached copy of your page from being made available. Relevant for sensitive content.
Bot-specific meta robots tags
One of the most powerful capabilities of meta robots tags is that you can give instructions to specific bots. Instead of the generic "robots" as the name value, you can use the name of a specific crawler.
<head>\n <!-- Instructions for all crawlers -->\n <meta name="robots" content="index, follow" />\n\n <!-- GPTBot may not index this page -->\n <meta name="GPTBot" content="noindex" />\n\n <!-- Google may index but not show snippet -->\n <meta name="googlebot" content="index, nosnippet" />\n\n <!-- Google AI (Gemini) may not use content for training -->\n <meta name="Google-Extended" content="noindex" />\n\n <!-- PerplexityBot may index and follow links -->\n <meta name="PerplexityBot" content="index, follow" />\n\n <!-- Block Anthropic's crawler -->\n <meta name="anthropic-ai" content="noindex" />\n</head>Bot-specific meta robots tags give you granular control that robots.txt does not offer. For example, you can allow Googlebot to index your content for search results, but block Google-Extended for AI training use. Or you can allow PerplexityBot to cite your content, but instruct GPTBot not to include your content. This differentiated approach is particularly valuable for organizations that want to strategically choose which AI platforms may use their content. Combine this with insights from our article on security headers for a complete control picture.
New directives for the AI era
With the rise of AI models that use web content for training and answer generation, new meta robots directives are emerging that specifically target AI usage.
noai and noimageai
Google has introduced the noai directive to give website owners control over the use of their content by AI models. The noimageai directive does the same specifically for images.
<head>\n <!-- Prevent AI models from using this content for training -->\n <meta name="robots" content="noai" />\n\n <!-- Prevent AI use of images on this page -->\n <meta name="robots" content="noimageai" />\n\n <!-- Combine multiple directives -->\n <meta name="robots" content="index, follow, noai" />\n\n <!-- Allow indexing but block AI training -->\n <!-- Page appears in search results but is not\n used for training AI models -->\n <meta name="robots" content="index, follow, noai, noimageai" />\n</head>max-snippet and its effect on AI citations
The max-snippet directive controls how many characters a search engine or AI model may use as a fragment of your content. This is a subtle but powerful instrument to limit how much of your content is directly shown without the user clicking through.
<head>\n <!-- Allow maximum 160 characters as snippet -->\n <meta name="robots" content="max-snippet:160" />\n\n <!-- Allow no snippet -->\n <meta name="robots" content="max-snippet:0" />\n\n <!-- No limit on snippet length -->\n <meta name="robots" content="max-snippet:-1" />\n\n <!-- Combine with other directives -->\n <meta name="robots" content="index, follow, max-snippet:200, noai" />\n</head>A max-snippet of 160 characters is comparable to a standard meta description. This limits how much of your content an AI model can directly display, encouraging users to click through to your website. Note: not all AI models respect max-snippet yet. It is, however, a signal that you consciously think about content usage.
Dive deeper: Robots.txt for AI: more than just crawl instructions | Canonical URLs and AI duplicates | E-E-A-T optimization for AI
X-Robots-Tag: HTTP header alternative
Not all content is HTML. PDF files, images and API responses do not have an HTML head section. For these cases, the X-Robots-Tag HTTP header provides the same functionality as the meta robots tag.
# X-Robots-Tag via HTTP headers\n# Nginx configuration\n\n# Block indexing of all PDF files\nlocation ~* \.pdf$ {\n add_header X-Robots-Tag "noindex, nofollow" always;\n}\n\n# Block AI training for images\nlocation ~* \.(jpg|jpeg|png|webp)$ {\n add_header X-Robots-Tag "noimageai" always;\n}\n\n# Bot-specific X-Robots-Tag\nlocation /premium-content/ {\n add_header X-Robots-Tag "GPTBot: noindex" always;\n add_header X-Robots-Tag "googlebot: index, nosnippet" always;\n}\n\n# Laravel middleware\n// app/Http/Middleware/XRobotsTag.php\npublic function handle($request, Closure $next)\n{\n $response = $next($request);\n \n if ($request->is('premium/*')) {\n $response->headers->set(\n 'X-Robots-Tag',\n 'noindex, nofollow'\n );\n }\n \n return $response;\n}The X-Robots-Tag header supports the same directives as the meta robots tag, including bot-specific targeting. This makes it an indispensable instrument for controlling AI access to non-HTML resources.
Strategic deployment of meta robots for AI
The power of meta robots tags lies in the strategic combination of directives per page type. Here is an overview of common scenarios.
- Public blog posts and articles: index, follow. Maximum visibility in both search results and AI answers.
- Premium or paid content: noindex or nosnippet. Prevent AI models from displaying your full content without the user paying.
- Internal search results and filter pages: noindex, follow. Prevent indexing of thin content but let crawlers follow the links.
- Sensitive business information: noindex, nofollow, noarchive. Maximum restriction for content that should not exist outside your site.
- Content you want in Google but not in AI training: index, follow, noai. Stay visible in search results but block AI training use.
Meta robots tags are your fine-grained control over AI. Where robots.txt is a moat, meta robots tags are the individual locks on every door in your castle.
Key takeaways
- Meta robots tags work at the page level and control what crawlers do after visiting your page, complementing robots.txt which manages access.
- Bot-specific meta robots tags (name="GPTBot") give you granular control over which AI platforms may index and use your content.
- New directives like noai and noimageai offer specific control over AI training use, separate from regular search indexing.
- The X-Robots-Tag HTTP header provides the same functionality for non-HTML content like PDFs and images.
- Combine meta robots tags strategically per page type: maximum visibility for public content, targeted restrictions for premium and sensitive content.
Frequently asked questions
Do all AI crawlers respect meta robots tags?
The major AI crawlers (GPTBot, Google-Extended, ClaudeBot) respect standard meta robots directives like noindex and nofollow. Support for newer directives like noai varies. Google respects noai for Google-Extended. OpenAI and Anthropic respect noindex for their respective crawlers. It is wise to use robots.txt alongside meta robots for double protection.
What is the difference between noindex in robots.txt and in meta robots?
Robots.txt does not contain a noindex directive. Robots.txt can only block access via Disallow. If a crawler is not allowed to visit a URL (Disallow), it cannot read the meta robots tag. But if the same URL is discovered through an external link, the crawler can still index the URL based on the link text. A meta robots noindex is therefore more effective: it explicitly tells the crawler not to index the page, regardless of how it found it.
Can I set meta robots tags dynamically per user or session?
Technically this is possible, but it is strongly discouraged. If you serve different meta robots tags to crawlers versus users, this is considered cloaking and can lead to penalties. The meta robots tags must be identical for all visitors (including crawlers). Instead, use server-side logic that consistently delivers the same tags based on the page type, not based on the visitor.
Does noindex affect my links and domain authority?
A noindex tag prevents the page from being included in the index, but if you add "follow" (noindex, follow), the crawler still follows the links on the page. Link equity still flows to the linked pages. If you also want to block links, use noindex, nofollow. The impact on domain authority is indirect: non-indexed pages do not contribute to your visibility, but the links on them still can.
How do I test whether my meta robots tags work correctly?
Check the HTML source code of your page (Ctrl+U) and look for the meta robots tag in the head section. Use the Google Rich Results Test to see how Google interprets your tags. For HTTP header variants (X-Robots-Tag), use curl -I https://yoursite.com/page to view the response headers. Also check your Google Search Console for any warnings about conflicting indexing instructions.
In the AI era you need fine-grained control over your content. Meta robots tags are the instrument with which you determine per page, per bot and per directive who may do what with your content.
How does your website score on AI readiness?
Get your AEO score within 30 seconds and discover what you can improve.