ai.txt: the new proposal for AI instructions

Bas Vermeer
Bas Vermeer SEO/AEO Specialist

Why robots.txt is not enough for AI

Robots.txt is a standard from 1994 originally designed for web crawlers building search indexes. It offers two basic instructions: "you may crawl this path" and "you may not crawl this path." In an era of simple search bots, that was sufficient. But AI systems are fundamentally different from traditional search crawlers and ask questions that robots.txt cannot answer.

May an AI model use my content for training? May it cite my texts in answers? Should attribution be added? Which sections of my site contain the most authoritative content? None of these questions fit within the crawl/do-not-crawl model of robots.txt. Website owners need a richer communication channel with AI systems, and that is exactly what ai.txt aims to provide.

The ai.txt proposal, introduced in 2025 by a coalition of content publishers, AI companies and web standards organizations, defines a new file at the root of a website specifically designed for communication with AI systems. It is not a replacement for robots.txt but a supplement that addresses its limitations.

IMPORTANT

ai.txt is a proposal in development, not a ratified standard. The specification is actively evolving and may change. Nevertheless, multiple AI providers are already implementing support, making early adoption beneficial.

The structure of ai.txt

The ai.txt file uses a section-based format that is intuitively readable by both humans and machines. Each section starts with a header in square brackets and contains key-value pairs that define instructions.

# ai.txt - AI instructions for example.com\n# Last updated: 2026-04-01\n\n[general]\nai_training = no\nai_inference = yes\nattribution_required = yes\npreferred_citation_format = "Source: {title} - {url}"\ncontact = ai-policy@example.com\nlicense_url = https://example.com/ai-license\n\n[content]\nprimary_language = en\nalternate_language = nl\nauthoritative_sections = /blog, /knowledge-base, /research\nnon_authoritative_sections = /archive, /legacy\npreferred_format = html\nllms_txt = /llms.txt\n\n[crawling]\nmax_requests_per_minute = 20\npreferred_hours = 02:00-06:00 UTC\nrespect_robots_txt = required\ncache_duration = 6h\n\n[agents]\na2a_endpoint = /api/a2a\nmcp_endpoint = /.well-known/mcp.json\nagent_card_url = /.well-known/agent-card.json\nauthentication = web_bot_auth\n\n[legal]\ntdm_reservation = yes\ntdm_policy = /.well-known/tdm-policy.json\njurisdiction = EU\ngdpr_contact = privacy@example.com

The file is built from five sections. The `[general]` section contains the basic instructions for AI usage. The `[content]` section describes the content structure. The `[crawling]` section defines technical crawl parameters. The `[agents]` section references endpoints for AI agents. The `[legal]` section contains legal and compliance information.

The [general] section in detail

The general section is the heart of ai.txt. Here you define the fundamental rules for how AI systems may use your content.

  • `ai_training`: yes/no. Indicates whether your content may be used for training AI models. This is the most impactful field.
  • `ai_inference`: yes/no. Indicates whether AI models may fetch and cite your content in real-time answers (retrieval-augmented generation).
  • `attribution_required`: yes/no. Specifies whether source attribution is required when your content is cited.
  • `preferred_citation_format`: a template indicating how you want to be cited. Variables like {title}, {url} and {author} are filled in by the AI system.
  • `contact`: an email address for questions about your AI policy.
  • `license_url`: a link to your full license terms.

The distinction between training and inference is crucial and aligns with developments around TDM headers. Where TDM headers communicate this distinction as an HTTP header, ai.txt offers the same information as a persistent file. Both mechanisms reinforce each other: an AI system visiting your site checks both the TDM headers and ai.txt and uses the most specific instruction. Read more about how to combine llms.txt with ai.txt for a complete communication model.

The [content] and [crawling] sections

The content section helps AI systems better understand your site and identify the most relevant content.

[content]\n# Which sections contain your most authoritative content?\nauthoritative_sections = /blog, /knowledge-base, /research\n\n# Which sections are outdated or less reliable?\nnon_authoritative_sections = /archive, /legacy\n\n# In which format do you preferably deliver content?\n# html = regular HTML pages\n# markdown = Markdown version available\n# structured = JSON-LD or structured data available\npreferred_format = html\n\n# Reference to your llms.txt for AI-specific content\nllms_txt = /llms.txt\n\n[crawling]\n# Maximum number of requests per minute\nmax_requests_per_minute = 20\n\n# Preferred hours for intensive crawling (off-peak)\npreferred_hours = 02:00-06:00 UTC\n\n# How long may AI systems cache your content?\ncache_duration = 6h\n\n# Is respecting robots.txt required or optional?\nrespect_robots_txt = required

The `authoritative_sections` field is particularly powerful. With it, you tell AI models which parts of your site contain the most reliable, current and citable content. This helps AI models cite your best content when in doubt rather than an arbitrary page.

The agents section functions as a discovery layer for AI agents that go beyond just fetching content. It references endpoints for the A2A Protocol, MCP servers and authentication mechanisms. This makes ai.txt a central starting point for every form of AI interaction with your site.

The legal section integrates with existing legal mechanisms. The `tdm_reservation` field communicates the same information as the TDM-Reservation HTTP header, but as a persistent file that AI systems can consult without first requesting a page. The `jurisdiction` field clarifies under which legal system you operate, which is relevant for interpreting your instructions.

Implementing ai.txt on your website

The implementation is straightforward: create a text file at the root of your domain. Pay attention to the correct file format and serve it with the correct Content-Type.

# Step 1: Create the file\n# Location: /ai.txt (root of your domain)\n\n# Step 2: Serve with correct Content-Type\n# Nginx configuration:\nlocation = /ai.txt {\n    default_type text/plain;\n    charset utf-8;\n    add_header Cache-Control "public, max-age=86400";\n    add_header X-Robots-Tag "noindex";\n}\n\n# Step 3: Link from your robots.txt (optional but recommended)\n# Add to robots.txt:\nAI-Policy: /ai.txt\n\n# Step 4: Link from your HTML head (optional)\n# <link rel="ai-policy" href="/ai.txt" />\n\n# Step 5: Verify\ncurl -I https://example.com/ai.txt\n# Expected: Content-Type: text/plain; charset=utf-8
TIP

Add an X-Robots-Tag: noindex header to your ai.txt response. The file is intended for machines, not for search engines to index and display as a search result.

ai.txt in the context of the broader ecosystem

ai.txt does not stand alone but forms part of a growing ecosystem of standards that together form the communication layer between websites and AI systems.

Robots.txt defines crawl rules. Llms.txt provides structured content for LLMs. TDM headers communicate legal conditions. Security headers build trust. Agent Cards identify bots. And ai.txt functions as the overarching instruction set that connects and supplements all these elements. An AI system visiting your site can use ai.txt as a starting point to discover which other files and endpoints are available.

  1. AI system visits /ai.txt as the first discovery step.
  2. Reads the [general] section for basic rules about training and inference.
  3. Follows the reference to /llms.txt for structured content.
  4. Checks the [crawling] section for rate limits and preferred hours.
  5. Consults the [agents] section for available A2A and MCP endpoints.
  6. Verifies the [legal] section and TDM policy for legal compliance.
Robots.txt was the first conversation between websites and bots. ai.txt is the evolution toward a mature dialogue with AI systems that do not just crawl, but understand, cite and act.

Key takeaways

  • ai.txt is a proposal for a new file format that gives website owners a standardized way to communicate instructions to AI systems.
  • The file is built from five sections: general (basic rules), content (content structure), crawling (technical parameters), agents (endpoints) and legal (legal).
  • The crucial distinction between training and inference is made explicit, allowing you to permit AI citation while prohibiting training.
  • ai.txt functions as a central starting point that references other standards such as llms.txt, TDM policy, MCP endpoints and Agent Cards.
  • Implementation is straightforward: a text file at the root of your domain, served with the correct Content-Type and optionally linked from robots.txt.

Frequently asked questions

Is ai.txt already an official standard?

No, ai.txt is in the proposal phase. A community specification has been published and several AI providers (including Perplexity and Anthropic) are experimenting with support. The path to an official standard through the W3C or IETF has started but is not yet completed. Nevertheless, the risk of early adoption is low: the file causes no harm if it is not read and offers benefits as soon as support grows.

Does ai.txt replace robots.txt?

No, ai.txt is a supplement to robots.txt, not a replacement. Robots.txt remains the primary mechanism for crawl instructions and is universally supported. ai.txt adds a communication layer that robots.txt cannot provide: instructions about data usage, training consent, citation preferences and agent endpoints. Use both files together for optimal communication with AI systems.

What if an AI system ignores my ai.txt?

In the current situation, ai.txt, like robots.txt, is a request without direct legal enforceability (unlike TDM headers in the EU). AI systems that ignore ai.txt are not breaking the law, but they are acting contrary to an explicit wish of the site owner. As the standard gains broader adoption and potentially receives legal backing, ignoring it could have legal consequences.

Can I dynamically generate ai.txt?

Yes, and this can be useful for sites that apply different rules per section. You can generate ai.txt via a server-side script (for example a Laravel route) that builds the instructions based on your configuration. Just ensure the result is cacheable and remains consistent, so AI systems do not receive a different answer with each request.

How many websites already use ai.txt?

Exact figures are difficult to determine, but a scan of the Alexa top 10,000 in March 2026 showed that about 3% of sites had an ai.txt file. For comparison: llms.txt was present on about 5% and security.txt on about 12%. Adoption is growing rapidly, especially among media companies, publishers and technology companies that are actively working on their AI strategy.

ai.txt is the answer to a simple question: if AI becomes the most important reader of your website, how do you tell AI what you expect from it? With a clear, standardized instruction file.

How does your website score on AI readiness?

Get your AEO score within 30 seconds and discover what you can improve.

Free scan

SHARE THIS ARTICLE

LINKEDIN X

RELATED ARTICLES