Robots.txt Generator

Generate a clean robots.txt file in seconds. Block 40+ known crawlers with one click: AI training bots (GPTBot, Google-Extended, ClaudeBot, PerplexityBot, CCBot, Amazonbot), aggressive SEO crawlers (SemrushBot, AhrefsBot), or just configure search engines. Add custom Allow/Disallow rules, set your sitemap URL, and copy the result. Live preview, free, runs in your browser.

Preset

Block specific bots

Custom rules

User-agent Allow paths (space-sep) Disallow paths (space-sep)

Sitemap & options

Live preview

— lines
User-agent: *
Allow: /

What Is robots.txt?

robots.txt is a plain-text file placed at the root of your website (https://yoursite.com/robots.txt) that tells web crawlers — search engines, AI training bots, SEO tools, archivers — which parts of your site they should and shouldn't access. The file follows the Robots Exclusion Protocol, an informal standard from 1994 that virtually every well-behaved crawler respects. It is not a security mechanism (bots can ignore it), but it's the universal way to communicate your crawl preferences.

This generator creates a syntactically valid robots.txt from a visual interface: pick a preset, check the bots you want to block from a curated list of 40+ known crawlers, optionally add custom Allow/Disallow rules, and link your sitemap. The preview updates live.

How to Use the Generator

Step 1 — Pick a preset. Five options cover the most common cases:

Step 2 — Fine-tune the bot list. Each crawler appears as a checkbox grouped by category: search engines, AI training bots, SEO crawlers, social previews, web archives. Hover over a name to see what the bot does. Check the boxes for the bots you want to block.

Step 3 — Add custom rules (optional). Click "Add User-agent rule" to define your own. Each row has a User-agent (use * for all), space-separated allow paths, and space-separated disallow paths.

Step 4 — Add your sitemap URL. Recommended for search engine discovery.

Step 5 — Copy or download. Save as robots.txt (lowercase, no extension change), upload to your site's root, and verify by visiting yoursite.com/robots.txt.

Blocking AI Training Crawlers

Since 2023, AI companies have introduced specific user-agents for content scraping, and most now respect robots.txt. To block them all in one click, use the "Block AI crawlers" preset. Here's what each major AI bot does:

Allow vs Disallow vs Crawl-delay

Common Patterns

Where to Place the File

The file MUST be at the root: https://yoursite.com/robots.txt (not in a subfolder). Use lowercase filename. Verify deployment by visiting that URL in your browser — you should see the raw text, not a 404 or your homepage. Most static-site generators (Next.js, Astro, Hugo, Jekyll) have specific places for the file; for WordPress, use a plugin like Yoast SEO or upload to the root via FTP. For Cloudflare Pages and Vercel, place it in the public/ directory.

Robots.txt Examples

Standard public site — allow all, just point to sitemap
User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml
Block AI training while allowing search engines (2025-2026 pattern)
# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: CCBot
Disallow: /

# Allow everyone else (including search engines)
User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml
WordPress defaults — block admin, allow ajax
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yoursite.com/sitemap.xml

Frequently Asked Questions

A plain-text file at yoursite.com/robots.txt that tells crawlers which parts of your site to crawl or skip. Defined by the Robots Exclusion Protocol from 1994. It's a request, not a wall — well-behaved bots respect it, but it's not a security mechanism.

Mostly yes, for major AI labs. OpenAI (GPTBot, ChatGPT-User), Anthropic (ClaudeBot, anthropic-ai), Google (Google-Extended), Perplexity, Amazon, Apple all publicly respect robots.txt and have documented their user-agents. Common Crawl (CCBot) respects it too. Smaller scrapers and bad actors may ignore it — robots.txt is a request, not a wall.

No. Google-Extended is a separate user-agent specifically for AI training (Gemini, Vertex AI). The standard Googlebot used for Google Search is unaffected. You can block Google-Extended and still rank normally in Google Search.

Disallow: / blocks the entire site. Disallow: with nothing after the colon means "no restrictions" — equivalent to allowing everything. Be precise: forgetting the slash creates a permissive rule, not a restrictive one.

Yes — just add multiple Sitemap: lines. Each on its own line, full absolute URL each time. Common for large sites that split sitemaps by content type (pages, posts, products, images). Search engines will fetch each one.

At the root of your site: https://yoursite.com/robots.txt (lowercase, no extension change). Verify by visiting that URL — you should see the raw text, not a 404 or HTML page. For static sites, place in your build's public folder; for WordPress, use an SEO plugin or upload via FTP.

No. Google ignores the Crawl-delay directive — they recommend using Google Search Console's crawl rate settings instead. Bing, Yandex and most other crawlers do honor it.

No, it's neither. Anyone can read your robots.txt (it's public), so listing "private" URLs there actually advertises them to bad actors. For real protection, use authentication, server-level access controls, or the noindex meta tag for individual pages.