Tools

robots.txt Generator

Block AI crawlers, control search engine access, and generate a ready-to-use robots.txt file. Toggle bots on and off or write custom rules.

Search Engines

AI Crawlers

Other Options

robots.txt
# Toggle options above to generate your robots.txt
Place robots.txt in the root of your site so it's accessible at https://yoursite.com/robots.txt

Which AI Bots Are Crawling Your Site?

These are the known AI crawler user-agents and what they're used for. Blocking them via robots.txt tells them not to scrape your content for training or retrieval.

Bot Name Company Purpose User-Agent String
GPTBot OpenAI Training data collection for GPT models GPTBot
ChatGPT-User OpenAI Real-time browsing in ChatGPT (user-initiated) ChatGPT-User
OAI-SearchBot OpenAI ChatGPT search results (SearchGPT) OAI-SearchBot
ClaudeBot Anthropic Training data collection for Claude models ClaudeBot
Claude-Web Anthropic Real-time browsing in Claude (user-initiated) Claude-Web
Google-Extended Google Training data for Gemini (separate from search indexing) Google-Extended
CCBot Common Crawl Open web archive used by many AI companies for training CCBot
Meta-ExternalAgent Meta Training data collection for Meta AI / Llama Meta-ExternalAgent
FacebookBot Meta AI features on Facebook and Instagram FacebookBot
Bytespider ByteDance Training data for ByteDance AI products Bytespider
PerplexityBot Perplexity AI search engine indexing and answer generation PerplexityBot
Amazonbot Amazon Alexa AI and Amazon product search Amazonbot
Applebot-Extended Apple Training data for Apple Intelligence features Applebot-Extended
cohere-ai Cohere Training data for Cohere language models cohere-ai

robots.txt Syntax Reference

User-agent

User-agent: Googlebot

Specifies which crawler the following rules apply to. Use * to target all crawlers.

Disallow

Disallow: /private/

Tells the bot not to crawl this path. An empty value Disallow: means allow everything.

Allow

Allow: /public/

Explicitly allows crawling a path, useful for overriding a broader Disallow rule. Not supported by all bots.

Sitemap

Sitemap: https://example.com/sitemap.xml

Points crawlers to your XML sitemap. Must be a full URL. You can list multiple sitemaps.

Crawl-delay

Crawl-delay: 10

Requests a delay (in seconds) between requests. Respected by Bing and Yandex, ignored by Google.

Wildcards

Disallow: /*.pdf$

Use * to match any sequence and $ to match end of URL. Supported by Google and Bing.

Frequently Asked Questions

Does robots.txt actually block AI training?

robots.txt is a voluntary standard — it relies on crawlers choosing to respect it. Major AI companies (OpenAI, Anthropic, Google, Meta) have committed to honoring robots.txt for their AI training crawlers. However, it does not provide a legal or technical guarantee. Content already crawled before you added the block may still be in training datasets.

Which AI companies respect robots.txt?

OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot), Anthropic (ClaudeBot, Claude-Web), Google (Google-Extended), Apple (Applebot-Extended), and Meta (Meta-ExternalAgent) all publicly respect robots.txt. Common Crawl (CCBot) also honors it. Smaller players vary — PerplexityBot has been called out for inconsistent compliance but has since improved.

Where do I put robots.txt?

Place the file in the root directory of your website so it's accessible at https://yoursite.com/robots.txt. For static site generators, put it in your public or static folder. For ZeroDeploy, Netlify, and Cloudflare Pages, put it in your build output directory alongside index.html.

Can I block AI crawlers without hurting my SEO?

Yes. AI training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) are completely separate from search engine indexing crawlers (Googlebot, Bingbot). Blocking AI crawlers has no effect on your search rankings. Just make sure you don't accidentally block Googlebot or Bingbot.

What's the difference between blocking GPTBot and ChatGPT-User?

GPTBot collects training data for OpenAI's models — blocking it prevents your content from being used in future model training. ChatGPT-User is the browser that ChatGPT uses when a user asks it to visit a URL in real-time. Blocking ChatGPT-User prevents ChatGPT from fetching your pages during conversations but doesn't affect training.

Should I block Common Crawl (CCBot)?

Common Crawl is a nonprofit that builds an open web archive. Many AI companies use this archive for training data, so blocking CCBot reduces your content's availability to a wide range of AI systems at once. However, Common Crawl data is also used for academic research and web analytics.

Deploy your site with zero configuration

ZeroDeploy serves your robots.txt automatically from the root of your site. Deploy in seconds with built-in forms, analytics, and custom domains.

Get Started Free