HTTP status codes and AI crawlers: what you need to know
The language servers and crawlers speak
Every time an AI crawler visits your website, a conversation takes place in HTTP status codes. Your server responds with a three-digit number that tells the crawler whether the request succeeded, whether the content has moved, or whether an error occurred. For traditional search engines like Google, these codes have been crucial for decades. For AI crawlers from Perplexity, OpenAI and Anthropic, they are equally important, but the consequences of misconfiguration are different.
AI crawlers typically have a more limited crawl budget than Googlebot and are less forgiving with unexpected responses. A wrong status code can cause your content to permanently disappear from an AI model's knowledge base. In this article we discuss all relevant HTTP status codes, how AI crawlers interpret them and what you can do to protect your visibility. If you are not yet familiar with how AI crawlers approach your site, read our article on robots.txt for AI first.
AI crawlers have a smaller crawl budget than traditional search engines. Every unnecessary redirect, soft 404 or server timeout costs valuable crawl capacity that does not come back.
2xx status codes: all is well
The 200-series status codes indicate that the request was successfully processed. This is what you want AI crawlers to see when visiting your most important pages.
- 200 OK: the standard success response. The server delivers the requested content. This is the ideal status code for all your indexable pages.
- 201 Created: primarily used in API responses after creating a resource. Not relevant for content pages.
- 204 No Content: the server successfully processed the request but sends no body back. AI crawlers cannot use this; avoid it for content pages.
The most important point with 200 responses is that the body actually contains the content the crawler expects. A common problem is the so-called "soft 404": a page that returns a 200 status code but actually displays an error page. AI crawlers can sometimes detect this situation by analyzing the content, but not always. Always use a real 404 status code for non-existent pages.
3xx redirects: pointing the way without getting lost
Redirects tell crawlers that content has moved to a new location. Correct use of redirects is essential for maintaining your AI visibility during URL changes.
# Nginx redirect configuration\n\n# 301 Permanent Redirect: use for definitive moves\nserver {\n # Permanently forward old URL to new URL\n location = /old-page {\n return 301 /new-page;\n }\n\n # Redirect entire domain (www to non-www)\n server_name www.example.com;\n return 301 https://example.com$request_uri;\n}\n\n# 302 Temporary Redirect: use for temporary moves\nlocation = /promotion {\n return 302 /temporary-promotion-page;\n}301 versus 302: a crucial difference for AI
The difference between a 301 (Moved Permanently) and a 302 (Found/Temporary Redirect) is significant for AI crawlers. With a 301 redirect, an AI crawler will remove the old URL from its index and replace it with the new one. With a 302, the crawler keeps the old URL in its index and periodically rechecks it.
- Use 301 for definitive URL changes: domain migration, URL path restructuring, page consolidation.
- Use 302 for temporary situations: A/B tests, seasonal content, temporary maintenance pages.
- Avoid redirect chains (A to B to C). Each additional redirect costs crawl budget and increases the chance the crawler gives up.
- Limit the total number of redirects on your site. More than 10% of your URLs redirecting is a signal of poor URL hygiene.
A common mistake is using a 302 redirect where a 301 is intended. This causes AI crawlers to keep the old URL in their index and give the new URL less priority. After a domain migration or permanent restructuring, you should always use 301 redirects. This aligns with the importance of a correct canonical URL structure.
4xx client errors: your door is locked
The 400-series status codes indicate that the client's (the crawler's) request cannot be processed. These are the codes that can damage your AI visibility the fastest.
- 401 Unauthorized: the crawler needs valid authentication. AI crawlers cannot authenticate, so content behind a 401 is invisible.
- 403 Forbidden: the server refuses the request. This is what an AI crawler sees when your robots.txt blocks the crawler or when your server refuses specific user agents.
- 404 Not Found: the requested resource does not exist. After repeated 404 responses, an AI crawler permanently removes the URL from its index.
- 410 Gone: the resource has been permanently removed. Stronger signal than 404; AI crawlers remove the URL faster and more definitively from their index.
- 429 Too Many Requests: the crawler is sending too many requests. The crawler reduces its crawl speed or stops temporarily.
# Laravel middleware for differentiated rate limiting per crawler\n// app/Http/Middleware/AiCrawlerRateLimit.php\n\nuse Closure;\nuse Illuminate\Http\Request;\nuse Illuminate\Support\Facades\RateLimiter;\n\npublic function handle(Request $request, Closure $next)\n{\n $userAgent = strtolower($request->userAgent() ?? '');\n $aiCrawlers = ['gptbot', 'perplexitybot', 'claudebot', 'anthropic-ai'];\n \n $isAiCrawler = false;\n foreach ($aiCrawlers as $crawler) {\n if (str_contains($userAgent, $crawler)) {\n $isAiCrawler = true;\n break;\n }\n }\n \n if ($isAiCrawler) {\n $key = 'ai-crawler:' . $request->ip();\n if (RateLimiter::tooManyAttempts($key, 60)) {\n return response('Too Many Requests', 429)\n ->header('Retry-After', 60);\n }\n RateLimiter::hit($key, 60);\n }\n \n return $next($request);\n}Always include a Retry-After header with a 429 response. This tells the AI crawler how many seconds it should wait before retrying. Without this header, the crawler may mark your site as unreliable.
5xx server errors: your server is failing
The 500-series status codes indicate that an error occurred on the server side. These codes are the most damaging to your AI visibility, because they indicate that your website is unreliable.
- 500 Internal Server Error: a general server error. AI crawlers will retry the URL later, but after repeated 500 errors they reduce crawl frequency.
- 502 Bad Gateway: the server acted as a gateway and received an invalid response from the upstream server. Often a sign of overload.
- 503 Service Unavailable: the server is temporarily unavailable, usually due to maintenance or overload. Always include a Retry-After header.
- 504 Gateway Timeout: the upstream server did not respond in time. AI crawlers with a limited time budget will give up quickly.
# Correct use of 503 during scheduled maintenance\n# Nginx configuration\nlocation / {\n if (-f /var/www/maintenance.flag) {\n return 503;\n }\n # Normal configuration...\n}\n\nerror_page 503 @maintenance;\nlocation @maintenance {\n add_header Retry-After 3600 always;\n add_header Content-Type "text/html" always;\n root /var/www/maintenance;\n try_files /index.html =503;\n}The difference between a 503 with Retry-After header and a 500 without further information is enormous. A 503 explicitly tells AI crawlers that the situation is temporary and when they can come back. A 500 gives no indication and can, after repeated occurrences, lead to a permanent reduction of your crawl frequency. This principle equally applies to how you set up your HTTPS configuration and security headers: consistency and clarity build trust.
Dive deeper: Robots.txt for AI crawlers | Canonical URLs and duplicate prevention | Security headers for AI trust
Monitoring and alerting for status code issues
The biggest risk with status code problems is not noticing them. A broken redirect or an intermittent 500 error can silently undermine your AI visibility for weeks. Therefore set up monitoring that alerts you to abnormal patterns.
- Monitor your server logs for 4xx and 5xx responses specifically for AI crawler user agents.
- Set up alerting when the percentage of 5xx responses exceeds 1%.
- Use tools like Google Search Console to detect crawl errors that Google's crawler encounters.
- Check your redirects periodically with tools like Screaming Frog or Sitebulb to detect redirect chains and loops.
- Log the Retry-After headers your server sends and verify that AI crawlers respect them.
An AI crawler that gets a 500 error three times in a row may never come back. Invest in monitoring before you lose visibility that you cannot regain.
Key takeaways
- HTTP status codes are the base language through which your server communicates with AI crawlers. Wrong codes lead directly to loss of AI visibility.
- Use 301 for permanent and 302 for temporary redirects. Avoid redirect chains and limit the total number of redirects.
- Soft 404s (200 status code on error pages) are particularly harmful: always use real 404 or 410 status codes for non-existent content.
- Always include a Retry-After header with 429 and 503 responses so AI crawlers know when they can come back.
- Monitor status code patterns specifically for AI crawler user agents and set up alerting for anomalies.
Frequently asked questions
What is the difference between a 404 and a 410 for AI crawlers?
A 404 (Not Found) indicates that the resource was not found, but does not imply that this is permanent. AI crawlers may retry the URL later. A 410 (Gone) explicitly indicates that the resource has been permanently removed and will never return. AI crawlers remove a 410 URL faster and more definitively from their index. Use 410 when you are certain that a page will never come back.
How many redirects can an AI crawler follow?
Most AI crawlers follow a maximum of 5 consecutive redirects. After 5 redirects, the chain is considered broken and the crawler stops. In practice, any chain of more than 2 redirects is already problematic, because each redirect costs crawl budget. Aim for every redirect to lead directly to the final destination, without intermediate steps.
What should I do if my site gets overloaded by AI crawlers?
Implement rate limiting specifically for AI crawler user agents and send 429 responses with a Retry-After header. Check whether your robots.txt contains a crawl-delay (not all AI crawlers respect this). Also consider whether your server has sufficient capacity: AI crawlers often visit multiple pages in quick succession. If the problem persists, contact the crawler operator through their officially published contact details.
How do I detect soft 404 pages on my website?
The simplest method is requesting a URL that you know does not exist and checking the HTTP status code. If your server returns a 200 instead of a 404, you have a soft 404 problem. Google Search Console also reports soft 404s. Additionally, you can use a crawl tool like Screaming Frog to scan all pages and check whether the status code matches the actual content.
Is a 302 redirect bad for AI visibility?
A 302 redirect is not inherently bad, but is often used incorrectly. When you use a 302 for a permanent move, AI crawlers keep the old URL in their index and give the new URL less priority. PageRank and citation authority are not transferred. Use 302 only for truly temporary situations. For everything that is permanent, a 301 is the correct choice.
HTTP status codes are the difference between an AI crawler that finds and indexes your content, and one that gives up and never comes back. Treat every response as an opportunity to build trust.
How does your website score on AI readiness?
Get your AEO score within 30 seconds and discover what you can improve.