Will robots.txt prevent Google from indexing my pages?

No. robots.txt only prevents crawling, not indexing. If another site links to your page, Google may still index it without crawling. Use meta robots noindex if you want to prevent indexing.

Do I need a robots.txt file?

No, but it's highly recommended. It helps control crawl efficiency and keeps sensitive areas private. Without one, bots may waste time on unimportant pages.

Can I block a specific bot like ChatGPT or Claude?

Yes. Create a rule with the bot's user-agent (e.g., 'GPTBot', 'Claude-Web') and set Disallow: / to block it entirely.

Where do I put the robots.txt file?

Save it as robots.txt in your website's root directory. For example: https://yoursite.com/robots.txt

Robots.txt Generator

Runs entirely in your browser

Create SEO-friendly robots.txt files to control crawler access. Define rules for different user-agents, set disallow paths, crawl delays, and sitemap URLs.

User-Agent Rules

User-Agent

Disallow Paths

No paths blocked for this user-agent

Optional Settings

Sitemap URL

Optional: URL to your XML sitemap

Crawl Delay (seconds)

Optional: Add delay between bot requests

Generated robots.txt

User-agent: *
Disallow:

Quick Tips

•Use * as user-agent to apply rules to all bots
•End paths with / to block entire directories
•Save as robots.txt in your root directory
•Test in Google Search Console's robots.txt tester

Runs entirely in your browser — nothing is uploaded

Runs in your browser

Next steps

Meta Tag Generator

Recommended

Generate optimized meta tags that help your pages rank higher.

Open Graph Preview

Recommended

See exactly how your link will look when shared on social media.

Schema Markup Generator

Generate structured data markup that helps search engines understand your site.

Keyword Density Checker

See which keywords dominate your content and fine-tune SEO.

View all tools →

Runs entirely in your browser. No uploads. Your files stay private.

How robots.txt Works — and Where It Quietly Fails

A robots.txt file is a plain-text file served at the root of a domain (https://example.com/robots.txt) that follows the Robots Exclusion Protocol originally drafted at robotstxt.org in 1994 and standardised by the IETF as RFC 9309 in 2022. It tells well-behaved crawlers which URL paths they may fetch using two core directives: User-agent (the bot the block applies to) and Disallow (the path prefix to skip).

This generator emits the canonical syntax: each block opens with one or more User-agent lines, followed by Disallow rules and an optional Allow rule that overrides a broader Disallow. Paths are matched as prefixes against the request URL, with two wildcard characters supported by major crawlers — * for any sequence and $ to anchor the end of a URL.

The most important thing to understand is that robots.txt is advisory, not enforced. It is a request, not a firewall. Compliant crawlers (Googlebot, Bingbot, DuckDuckBot, most academic bots) honour it; aggressive scrapers, malware harvesters, and many AI training crawlers either ignore it or read it to find what you are trying to hide. Anything truly sensitive needs server-side authentication, not a Disallow line.

Disallow blocks crawling, but it does not block indexing. If another site links to a Disallow-ed URL, Google can still list it in search results with no snippet — the page is indexed without ever being fetched. To keep a page out of the index entirely, allow crawling and add a meta robots noindex tag or X-Robots-Tag HTTP header instead.

Crawl-delay is the most misunderstood directive in the file. Google has officially ignored it since 2019 — set crawl rate inside Search Console instead. Bing, Yandex, and Seznam still honour Crawl-delay (in seconds between hits), so the directive is only useful if those crawlers matter to you.

The Sitemap directive is independent of any User-agent block and is read by all major engines. Listing your sitemap.xml URL here is the simplest way to register it without using webmaster tools, and you can list multiple sitemaps if you split them by content type or language.

Keep the file under 500 KiB — Google parses only the first 500 KiB and silently drops the rest. Use # for comments, put each directive on its own line, and validate the result in Google Search Console's robots.txt Tester before deploying, since a stray Disallow: / can deindex an entire site within days.

Common Use Cases

Block staging and preview environments

Add Disallow: / under User-agent: * on staging.example.com so pre-launch builds never appear in Google or leak unfinished copy.

Save crawl budget on faceted search

Disallow query-string filter URLs (/search, /products?color=) so Googlebot spends its budget on canonical product and category pages instead.

Hide internal admin and account paths

List /admin/, /account/, and /checkout/ to keep dashboards and authenticated routes out of public crawl logs and search results.

Block AI training crawlers selectively

Add specific User-agent blocks for GPTBot, ClaudeBot, CCBot, Google-Extended, and PerplexityBot if you want to opt out of LLM training datasets.

Frequently Asked Questions

No. Disallow blocks crawling, not indexing. Google can still list a Disallow-ed URL with no snippet if external sites link to it. To remove a page from the index, allow crawling and add a meta robots noindex tag (or an X-Robots-Tag HTTP header) so Googlebot can read the directive.

Google has officially ignored Crawl-delay since 2019 — they consider it imprecise and recommend setting crawl rate inside Search Console instead. Bing, Yandex, and Seznam still honour it. The number is in seconds between requests, so Crawl-delay: 10 means at most one request every ten seconds for those crawlers.

Yes — they all publish a User-agent string and read robots.txt. Add a block per bot with Disallow: / to opt out of training. Note that this only stops the well-behaved ones; some scrapers ignore the file entirely, and a robots.txt block does not retroactively remove your content from models already trained on it.

Google treats a persistent 5xx as "crawl everything fully restricted" and stops crawling the site for around 30 days, which can devastate organic traffic. A 4xx (especially 404) is treated as "no restrictions" — Googlebot crawls everything. Always serve a valid 200 response, even if the file is empty.

Google parses the first 500 KiB and silently ignores anything beyond that, so very long files can leave later directives unread. If you find yourself near the limit, consolidate rules with wildcards and the $ end-anchor instead of listing thousands of individual paths.

Within a User-agent group, Google applies the most specific (longest) matching path, not the first one listed. So Allow: /blog/public/ will override a broader Disallow: /blog/ regardless of which line comes first. Other crawlers may use first-match — keep groups simple to avoid surprises.

No — never use it as a security tool. Listing /admin/ or /private-data/ in a public file just advertises those paths to attackers. Anything sensitive must be protected by authentication, IP allow-listing, or simply not deployed to a public server. robots.txt is read by anyone who types /robots.txt into a browser.

No. Per RFC 9309, User-agent matching is case-insensitive, so Googlebot, googlebot, and GOOGLEBOT are equivalent. Path matching, however, is case-sensitive on most servers — Disallow: /Admin/ will not block /admin/.

Yes. List one Sitemap: directive per line, anywhere in the file (it is not tied to a User-agent block). This is useful when you split a sitemap by language, content type, or because a single sitemap exceeds the 50,000-URL or 50 MB limit and you need a sitemap index plus children.

Use Google Search Console's robots.txt Tester — it parses your file with the same code Googlebot uses and lets you check whether a specific URL would be blocked for a chosen user-agent. Bing Webmaster Tools offers an equivalent tester. Always test before pushing, since a misplaced slash can deindex large sections of a site within hours.