A plain-text file at yoursite.com/robots.txt that tells crawlers which parts to crawl or skip. Defined by the Robots Exclusion Protocol from 1994. A request, not a security mechanism.

Robots.txt Generator — Block AI Crawlers (GPTBot, Claude) & Configure Bots

What Is robots.txt?

robots.txt is a plain-text file placed at the root of your website (https://yoursite.com/robots.txt) that tells web crawlers — search engines, AI training bots, SEO tools, archivers — which parts of your site they should and shouldn't access. The file follows the Robots Exclusion Protocol, an informal standard from 1994 that virtually every well-behaved crawler respects. It is not a security mechanism (bots can ignore it), but it's the universal way to communicate your crawl preferences.

This generator creates a syntactically valid robots.txt from a visual interface: pick a preset, check the bots you want to block from a curated list of 40+ known crawlers, optionally add custom Allow/Disallow rules, and link your sitemap. The preview updates live.

How to Use the Generator

Step 1 — Pick a preset. Five options cover the most common cases:

Allow all — the default. Permits every crawler. Use for public sites that want maximum search visibility.
Block all — disallows everyone. Only for development/staging sites that should never appear in search.
Block AI crawlers ⭐ — blocks 15+ bots used to harvest training data (GPTBot, Google-Extended, ClaudeBot, etc.) while keeping search engine bots welcome. The most popular preset in 2025-2026.
WordPress — sensible defaults for a WordPress site (block /wp-admin/, allow admin-ajax.php).
Custom — start from a clean slate.

Step 2 — Fine-tune the bot list. Each crawler appears as a checkbox grouped by category: search engines, AI training bots, SEO crawlers, social previews, web archives. Hover over a name to see what the bot does. Check the boxes for the bots you want to block.

Step 3 — Add custom rules (optional). Click "Add User-agent rule" to define your own. Each row has a User-agent (use * for all), space-separated allow paths, and space-separated disallow paths.

Step 4 — Add your sitemap URL. Recommended for search engine discovery.

Step 5 — Copy or download. Save as robots.txt (lowercase, no extension change), upload to your site's root, and verify by visiting yoursite.com/robots.txt.

Blocking AI Training Crawlers

Since 2023, AI companies have introduced specific user-agents for content scraping, and most now respect robots.txt. To block them all in one click, use the "Block AI crawlers" preset. Here's what each major AI bot does:

GPTBot — OpenAI's training data crawler. Blocked = your content excluded from future GPT model training.
ChatGPT-User — fetches content when a ChatGPT user asks a real-time question about your URL.
Google-Extended — Google's separate AI training bot (Gemini, Vertex AI). Blocking it does NOT affect Google Search ranking — Googlebot is separate.
ClaudeBot / anthropic-ai / Claude-Web — Anthropic's crawlers for Claude training and real-time fetches.
PerplexityBot — Perplexity's search-engine-like crawler.
CCBot — Common Crawl, the open dataset used by many AI labs as training data.
Amazonbot — Amazon's web crawler for Alexa and product information.
applebot-extended — Apple's AI training bot (Apple Intelligence).
Bytespider — ByteDance / TikTok crawler.
Meta-ExternalAgent — Meta's AI training crawler (Llama).

Allow vs Disallow vs Crawl-delay

User-agent: — specifies which bot the following rules apply to. Use * for all.
Disallow: /path — tells the bot NOT to crawl URLs starting with /path.
Disallow: (empty) — allows everything (default).
Disallow: / — disallows the entire site.
Allow: /path — explicit allow, useful for opening sub-paths under a disallowed parent.
Crawl-delay: N — request bots to wait N seconds between requests. Honored by Bing, Yandex, others, but NOT by Google (use Google Search Console instead).
Sitemap: URL — points crawlers to your sitemap. Place at the top or bottom of the file.

Common Patterns

Standard site, allow everyone: User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xml
WordPress: Disallow /wp-admin/ but allow /wp-admin/admin-ajax.php
Shopify / e-commerce: Disallow /cart, /checkout, /account
Block staging: User-agent: * Disallow: /
Block AI but allow search: One Disallow: / block per AI bot, plus User-agent: * Allow: / for search engines

Where to Place the File

The file MUST be at the root: https://yoursite.com/robots.txt (not in a subfolder). Use lowercase filename. Verify deployment by visiting that URL in your browser — you should see the raw text, not a 404 or your homepage. Most static-site generators (Next.js, Astro, Hugo, Jekyll) have specific places for the file; for WordPress, use a plugin like Yoast SEO or upload to the root via FTP. For Cloudflare Pages and Vercel, place it in the public/ directory.

Frequently Asked Questions

What is robots.txt?

A plain-text file at yoursite.com/robots.txt that tells crawlers which parts of your site to crawl or skip. Defined by the Robots Exclusion Protocol from 1994. It's a request, not a wall — well-behaved bots respect it, but it's not a security mechanism.

Will blocking AI crawlers actually stop them?

Mostly yes, for major AI labs. OpenAI (GPTBot, ChatGPT-User), Anthropic (ClaudeBot, anthropic-ai), Google (Google-Extended), Perplexity, Amazon, Apple all publicly respect robots.txt and have documented their user-agents. Common Crawl (CCBot) respects it too. Smaller scrapers and bad actors may ignore it — robots.txt is a request, not a wall.

Does blocking Google-Extended hurt my Google search ranking?

No. Google-Extended is a separate user-agent specifically for AI training (Gemini, Vertex AI). The standard Googlebot used for Google Search is unaffected. You can block Google-Extended and still rank normally in Google Search.

What's the difference between Disallow: / and Disallow:?

Disallow: / blocks the entire site. Disallow: with nothing after the colon means "no restrictions" — equivalent to allowing everything. Be precise: forgetting the slash creates a permissive rule, not a restrictive one.

Can I have multiple sitemaps?

Yes — just add multiple Sitemap: lines. Each on its own line, full absolute URL each time. Common for large sites that split sitemaps by content type (pages, posts, products, images). Search engines will fetch each one.

Where do I put the file?

At the root of your site: https://yoursite.com/robots.txt (lowercase, no extension change). Verify by visiting that URL — you should see the raw text, not a 404 or HTML page. For static sites, place in your build's public folder; for WordPress, use an SEO plugin or upload via FTP.

Does Google honor Crawl-delay?

No. Google ignores the Crawl-delay directive — they recommend using Google Search Console's crawl rate settings instead. Bing, Yandex and most other crawlers do honor it.

Is robots.txt a privacy or security feature?

No, it's neither. Anyone can read your robots.txt (it's public), so listing "private" URLs there actually advertises them to bad actors. For real protection, use authentication, server-level access controls, or the noindex meta tag for individual pages.

Robots.txt Generator

Block specific bots

Custom rules

Sitemap & options

Live preview

What Is robots.txt?

How to Use the Generator

Blocking AI Training Crawlers

Allow vs Disallow vs Crawl-delay

Common Patterns

Where to Place the File

Robots.txt Examples

Frequently Asked Questions

Robots.txt Generator

Block specific bots

Custom rules

Sitemap & options

Live preview

What Is robots.txt?

How to Use the Generator

Blocking AI Training Crawlers

Allow vs Disallow vs Crawl-delay

Common Patterns

Where to Place the File

Robots.txt Examples

Frequently Asked Questions

Related Tools