All articles
|5 min read

How to Check If AI Bots Can Crawl Your Website

If AI search engines can't crawl your website, your content will never appear in AI-generated answers. Many sites block AI bots without realizing it — either through overly restrictive robots.txt rules or CMS defaults. Here's how to check and fix it.

The 10 Major AI Crawlers

These are the AI crawlers that read websites to power AI search engines and language models. If any of them are blocked in your robots.txt, that AI service can't use your content.

  • GPTBot — OpenAI (indexes content for ChatGPT)
  • OAI-SearchBot — OpenAI (builds the ChatGPT search index)
  • ChatGPT-User — OpenAI (real-time browsing when a user asks ChatGPT to visit a URL)
  • ClaudeBot — Anthropic (powers Claude)
  • PerplexityBot — Perplexity AI
  • Google-Extended — Google (controls Gemini model training only; does not affect AI Overviews)
  • Amazonbot — Amazon (Alexa and Amazon AI services)
  • Bytespider — ByteDance (TikTok AI features)
  • CCBot — Common Crawl (used by many AI companies for training data)
  • cohere-ai — Cohere (enterprise AI models)

How to Check Your robots.txt

Open your browser and go to yoursite.com/robots.txt. This file controls which bots can access which pages. Look for any rules that might block AI crawlers.

Patterns That Block AI Bots

# This blocks ALL bots including AI crawlers:
User-agent: *
Disallow: /

# This specifically blocks GPTBot:
User-agent: GPTBot
Disallow: /

# This blocks a common set of AI bots:
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: PerplexityBot
Disallow: /

robots.txt patterns that prevent AI citation

The AI-Friendly robots.txt

# Allow all AI crawlers
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Amazonbot
Allow: /

# Block sensitive paths for all bots
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /dashboard/

AI-friendly robots.txt that allows crawling while protecting private routes

Common Mistakes

  • Wildcard blocks — "User-agent: * / Disallow: /" blocks everything, including AI bots.
  • CMS defaults — WordPress security plugins (Wordfence, Sucuri) and some themes block AI crawlers by default.
  • CDN/WAF rules — Cloudflare and similar services may rate-limit or block AI bot user agents.
  • Selective blocking — Blocking GPTBot but allowing others means you're invisible on ChatGPT but visible on Perplexity.
  • Not testing after changes — Always verify your robots.txt is serving the expected content after deploying updates.

Should You Allow All AI Bots?

For most websites, yes. AI citations drive brand visibility and increasingly drive traffic. Some sites choose to selectively block specific crawlers (usually for content licensing reasons), but for most businesses the visibility benefit far outweighs any concerns.

The only exception: if you have premium content behind a paywall, you may want to block AI crawlers from those specific paths while allowing them on your public pages.

Automate the Check

Instead of manually reading your robots.txt and cross-referencing bot names, run a free AEO scan. It automatically checks all 10 major AI crawler user agents against your robots.txt and tells you exactly which bots are allowed and which are blocked.

Check your AEO score for free

Enter your URL and see how your site scores across all 6 AEO factors. No signup required.

Get your AEO score instantly. No signup required.