Well-known URIs for AI: an overview

Bas Vermeer
Bas Vermeer SEO/AEO Specialist

What are well-known URIs?

Well-known URIs are standardized paths on a web server that host specific metadata or configuration files. The concept is defined in RFC 8615 and allows clients (browsers, bots, agents) to request information at a predictable location without prior knowledge. The most familiar example is `/.well-known/` as a prefix, but root files like `/robots.txt` and `/sitemap.xml` functionally fall under the same category.

For AI crawlers and AI agents, well-known URIs are the first place they look when visiting a website. They form the discovery layer: the mechanism through which bots and agents learn how they may and can interact with a website. A website without the right well-known files is like an office building without a name plate or reception desk. You are there, but nobody knows how you want to be approached.

The relevance of well-known URIs for AI visibility has grown significantly over the past two years. Where traditional SEO was limited to robots.txt and sitemaps, effective Answer Engine Optimization requires a broader arsenal of standardized files that inform AI bots and agents about your site, your content and your terms.

IMPORTANT

Well-known URIs are not optional extras. They form the foundation of communication between your website and AI systems. Every missing file is a missed opportunity to guide how AI discovers, indexes and uses your content.

The essential well-known files for AI

Below is an overview of the most important well-known URIs and root files relevant to AI crawlers and AI agents, ordered from widely adopted to emerging.

robots.txt (root)

The oldest and most universal file for bot communication. Robots.txt specifies which paths bots may crawl and which they may not. In the AI context, it is crucial to include specific rules for AI crawlers such as GPTBot, ClaudeBot, PerplexityBot and GoogleOther. See our comprehensive article on robots.txt for AI for the full implementation guide.

sitemap.xml (root)

Your XML sitemap tells crawlers which pages exist on your site, when they were last updated and how important they are. AI crawlers use sitemaps to efficiently determine which pages are worth crawling, especially on large websites.

llms.txt (root)

An emerging file specifically designed for Large Language Models. It offers a structured, simplified version of your site content that LLMs can process more easily than regular HTML. Read more about the format and implementation in our article on llms.txt.

security.txt (/.well-known/)

RFC 9116 defines security.txt as a standard location for security contact information. AI crawlers use this file as one of their trust signals: a website with a valid security.txt shows that the owner pays attention to security and compliance.

ai.txt (root, proposal)

A new proposal specifically designed to communicate AI instructions. Where robots.txt is limited to crawl instructions, ai.txt provides space for broader guidelines such as data usage, training consent and license terms.

# Overview: well-known files for AI\n# Location and status per file\n\n/robots.txt                          # Status: universal standard (RFC 9309)\n/sitemap.xml                         # Status: universal standard\n/llms.txt                            # Status: emerging standard\n/ai.txt                              # Status: proposal (draft)\n/.well-known/security.txt            # Status: official (RFC 9116)\n/.well-known/agent-card.json         # Status: proposal (draft)\n/.well-known/bot-auth-keys.json      # Status: proposal (draft)\n/.well-known/openid-configuration    # Status: official (OpenID Connect)\n/.well-known/mcp.json                # Status: proposal (draft)\n/.well-known/tdm-policy.json         # Status: proposal (EU TDM directive)

Emerging well-known URIs for the AI era

In addition to the established files, several new well-known URIs are being developed that specifically target AI interaction.

agent-card.json

This file publishes the identity and capabilities of AI agents deployed by an organization. It is intended for bot operators, not website owners. When a bot visits your site and identifies itself as belonging to a certain operator, you can fetch the agent card to verify who the bot is and what its intentions are.

bot-auth-keys.json

Part of the Web Bot Auth protocol. This file contains the public keys with which websites can verify the cryptographic identity of AI bots. Similar to JWKS (JSON Web Key Sets) in the OAuth ecosystem.

mcp.json

The Model Context Protocol (MCP) discovery file. Describes which MCP endpoints a server offers, which tools are available and how AI agents can connect. This is directly related to the broader MCP standard we discuss in our article on MCP Servers.

tdm-policy.json

Based on the EU Text and Data Mining directive. This file communicates the conditions under which content may be used for text and data mining, including AI training. It contains license information, opt-out declarations and contact details for commercial licenses.

Implementation and best practices

Correctly implementing well-known URIs requires attention to several technical details that are often overlooked.

# Nginx configuration for well-known URIs\n\n# Ensure /.well-known/ paths are served correctly\nlocation /.well-known/ {\n    # Allow direct access (not via Laravel/PHP)\n    try_files $uri $uri/ =404;\n\n    # Correct content types\n    location ~ \.json$ {\n        default_type application/json;\n        add_header Access-Control-Allow-Origin "*";\n        add_header Cache-Control "public, max-age=86400";\n    }\n\n    location ~ \.txt$ {\n        default_type text/plain;\n        add_header Cache-Control "public, max-age=86400";\n    }\n}\n\n# Root files\nlocation = /robots.txt {\n    default_type text/plain;\n    add_header Cache-Control "public, max-age=3600";\n}\n\nlocation = /llms.txt {\n    default_type text/plain;\n    add_header Cache-Control "public, max-age=86400";\n}\n\nlocation = /ai.txt {\n    default_type text/plain;\n    add_header Cache-Control "public, max-age=86400";\n}
  • Always serve well-known files with the correct Content-Type header (text/plain for .txt, application/json for .json).
  • Add CORS headers (Access-Control-Allow-Origin) so JavaScript clients and AI agents can fetch the files.
  • Use caching headers with a reasonable TTL (1 to 24 hours) to limit server load during frequent crawling.
  • Validate your JSON files with a schema validator before deploying; a syntax error makes the file unusable.
  • Monitor 404 errors on well-known paths in your server logs to detect which AI bots request which files.

A checklist for your well-known configuration

Use this checklist to verify that your website correctly serves the essential well-known files. Each file contributes to how AI systems discover and understand your site. Combine this with a well-configured security headers setup for maximum trust.

  1. Check that /robots.txt exists, is syntactically correct and contains specific rules for AI crawlers.
  2. Verify that /sitemap.xml is current and correctly references all indexable pages.
  3. Implement /llms.txt with a structured overview of your most important content.
  4. Publish /.well-known/security.txt with your security contact information conforming to RFC 9116.
  5. Consider /ai.txt if you want to communicate specific instructions about AI use of your content.
  6. If you deploy AI bots: publish /.well-known/agent-card.json with the identity and intentions of your agents.
Well-known URIs are your website's business card for AI systems. The more complete and accurate they are, the better AI bots and agents know how to handle your content.

Key takeaways

  • Well-known URIs are standardized paths where AI bots and agents request metadata and configuration when visiting your website.
  • The essential files are robots.txt, sitemap.xml, llms.txt and security.txt; emerging standards include ai.txt, agent-card.json and mcp.json.
  • Serve all files with the correct Content-Type, add CORS headers and use reasonable cache TTLs.
  • Monitor which AI bots request which well-known paths to gain insight into how AI systems discover your site.
  • Treat well-known URIs not as one-time configuration but as ongoing maintenance: update them when your site or AI strategy changes.

Frequently asked questions

Do I need to implement all well-known files?

Not necessarily. Start with the essential files: robots.txt, sitemap.xml and security.txt. These are consulted by virtually all AI crawlers. Add llms.txt if you want to help AI models process your content efficiently. The remaining files (ai.txt, agent-card.json, mcp.json) are relevant if you want to actively steer how AI agents interact with your site. Prioritize based on your specific objectives.

What happens if a well-known file returns a 404?

A 404 on a well-known path is not harmful, but it is a missed opportunity. AI bots that receive a 404 continue with their default behavior without the context the file could have provided. They do not block your site because of a missing file, but you have no influence on how they process your content.

How often should I update my well-known files?

Robots.txt and sitemap.xml should always be current; update them whenever your site structure changes. Security.txt should be renewed annually (it has an expiry field). Update llms.txt when you add significant new content. The remaining files are relatively stable and only need updating when your policy or configuration changes.

Can well-known files influence my SEO?

Indirectly, yes. A correctly configured robots.txt and an up-to-date sitemap directly contribute to your traditional SEO. Security.txt and the AI-specific files influence your AI visibility, which increasingly overlaps with traditional SEO as Google expands AI Overviews and similar features.

Should I publish well-known files on all subdomains?

Yes, each subdomain should have its own well-known files if it hosts a separate website or application. AI crawlers treat subdomains as separate entities. A robots.txt on example.com does not apply to blog.example.com. If you manage subdomains centrally, ensure an automated deployment process that synchronizes the files.

The discovery layer of the web evolves alongside the complexity of its visitors. Well-known URIs were once only for bots; now they are the language you use to communicate with AI.

How does your website score on AI readiness?

Get your AEO score within 30 seconds and discover what you can improve.

Free scan

SHARE THIS ARTICLE

LINKEDIN X

RELATED ARTICLES