Free Technical SEO Tool — No Login Required

Robots.txt Generator

Generate a valid robots.txt file for any website. Choose from WordPress, Shopify, and other presets, or build custom rules — then download or copy your file in one click. Control search engine crawler access to protect admin pages, private sections, and low-value URLs.

Rule 1

Generated robots.txt

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/

Sitemap: https://example.com/sitemap.xml
How it works

From directives to crawler control.

Build your robots.txt file in three steps — no coding required.

01
01

Choose a Preset or Build Custom Rules

Select a platform preset (WordPress, Shopify) to start with sensible defaults for your CMS, or choose "Custom" to build rules from scratch. Each preset blocks sensitive admin areas and low-value URLs that search engines should not index.

02
02

Configure User-agents and Directives

Add Allow and Disallow directives for specific URL paths. Target all bots with User-agent: * or restrict rules to specific crawlers like Googlebot or GPTBot. Add your sitemap URL to help search engines discover all your indexable pages.

03
03

Download or Copy Your File

Copy the generated robots.txt to your clipboard or download it as a .txt file. Upload it to your website's root directory (domain.com/robots.txt) and validate it in Google Search Console to ensure crawlers are reading it correctly.

What Is a Robots.txt File and Why Does It Matter?

A robots.txt file is a plain text file placed in the root directory of your website that tells search engine crawlers which pages and directories they can and cannot access. It follows the Robots Exclusion Protocol (REP) — a standard that well-behaved bots like Googlebot, Bingbot, and others respect. The file is one of the first things search engines check when they visit a new website.

Robots.txt is a crawl budget management tool. Googlebot has a limited crawl budget for each website — the number of pages it will crawl in a given period. By blocking low-value URLs (admin pages, duplicate content, internal search results, thank-you pages), you direct crawl budget toward the pages that actually matter for SEO. For large websites with thousands of pages, this can meaningfully improve how quickly new content gets indexed.

Critically, robots.txt controls crawl access — not indexation. A page blocked by robots.txt will not be crawled, but it can still appear in search results if other sites link to it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag header instead. For most sites, the correct approach is: use robots.txt to block crawling of low-value URLs, and use noindex to prevent indexation of specific pages.

Robots.txt Syntax and Directives Explained

Each section of a robots.txt file begins with a User-agent line that specifies which crawler the following rules apply to. "User-agent: *" applies rules to all crawlers. "User-agent: Googlebot" applies rules only to Google's crawler. You can have multiple User-agent blocks in one file, each with its own set of rules.

The Disallow directive tells the specified crawler not to access a particular URL path. "Disallow: /admin/" blocks all URLs starting with /admin/. "Disallow: /" blocks the entire site. "Disallow:" with no path means allow everything — this is how you explicitly grant access. The Allow directive can be used to create exceptions within a broader Disallow rule.

The Sitemap directive (at the bottom of the file) tells crawlers the location of your XML sitemap. This is not part of the original robots exclusion protocol but is supported by all major search engines. Including your sitemap URL in robots.txt helps Googlebot discover all your indexable URLs even when the crawl budget is limited.

Common Robots.txt Mistakes That Hurt SEO

The most damaging robots.txt mistake is accidentally blocking your entire site with "Disallow: /". This happens more often than you'd think — especially during site migrations when developers forget to update the robots.txt from the staging configuration. Always verify your live robots.txt at domain.com/robots.txt after any major site change.

Blocking CSS and JavaScript files is another common error. Googlebot needs access to CSS and JS to render your pages correctly. If these files are blocked, Google may see a very different version of your pages than users do — which can negatively impact mobile-first indexing and page experience signals. Never block /wp-content/themes/ on WordPress sites without testing.

Using robots.txt to hide sensitive data is a security mistake. Bots that ignore robots.txt — and many do — will still access blocked URLs. Sensitive data (user data, admin interfaces, private files) should be protected by server-level authentication, not robots.txt. Additionally, publicly listing blocked paths in robots.txt can alert malicious crawlers to the existence of sensitive directories.

Robots.txt for WordPress, Shopify, and Other Platforms

WordPress sites should block /wp-admin/ (except /wp-admin/admin-ajax.php, which powers dynamic features), /wp-includes/, and plugin/theme directories that contain no user-facing content. Do not block /wp-content/uploads/, as this contains images and media that Google should index for image search.

Shopify sites should block /admin, /cart, /checkout, /orders, and /account pages — these are user-specific and should not be indexed. Shopify also generates faceted navigation URLs with query parameters (?sort_by=, ?filter=) that create duplicate content. Blocking /?* (all query parameter URLs) is common practice, though verify this doesn't block any important filtered collection pages before deploying.

For custom sites and frameworks (Next.js, Nuxt, Django, Rails), create platform-specific rules based on your URL structure. Common targets for disallowing include /api/ endpoints (if they return raw data, not rendered pages), /search? result pages, /thank-you/ pages, and any staging or preview URL patterns that might accidentally get indexed.

How to Validate and Test Your Robots.txt File

After uploading your robots.txt file to the root directory, verify it's accessible at https://yourdomain.com/robots.txt. Use Google Search Console's Robots.txt Tester (under Settings > Robots.txt) to test whether specific URLs are allowed or blocked by your current configuration. This tool also highlights syntax errors.

Test every critical URL type after making changes. Check that your homepage (/) is allowed, that your most important pages are allowed, and that the paths you intended to block are actually blocked. A simple mistake in path formatting (missing trailing slash, wrong directory name) can block unintended pages.

Monitor your crawl stats in Google Search Console after deploying changes. Under "Settings > Crawl stats," you'll see Googlebot's crawl activity over time. If crawling drops significantly after a robots.txt update, investigate which pages were blocked. Use the URL Inspection tool to check the robots.txt rules for specific URLs.

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file is a plain text file in your website's root directory that tells search engine crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol, respected by Googlebot, Bingbot, and other well-behaved crawlers. It's used to manage crawl budget, block admin pages, and prevent indexing of low-value URLs.

Where should I put my robots.txt file?

The robots.txt file must be placed in the root directory of your domain at https://yourdomain.com/robots.txt. It cannot be in a subdirectory — a file at /blog/robots.txt will not be found by crawlers. Most CMS platforms (WordPress, Shopify) handle this automatically.

Does robots.txt prevent pages from being indexed?

No — robots.txt controls crawl access, not indexation. A blocked page cannot be crawled, but Google can still index it if other websites link to it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag HTTP header instead of (or in addition to) a Disallow rule.

What is Disallow: / in robots.txt?

"Disallow: /" blocks all crawlers from accessing any page on your website. This is the most extreme robots.txt configuration and is commonly used on staging environments to prevent Google from indexing test content. Never deploy "Disallow: /" on a live production site.

How do I block specific bots in robots.txt?

Use a specific User-agent name followed by Disallow directives. For example, to block GPTBot (OpenAI's crawler): "User-agent: GPTBot" on one line and "Disallow: /" on the next. You can find the official User-agent names for specific bots in their documentation.

Should I include my sitemap in robots.txt?

Yes — adding "Sitemap: https://yourdomain.com/sitemap.xml" at the end of your robots.txt helps search engines discover all your indexable pages. It's especially helpful for large sites where crawlers might otherwise miss recently published content.

Can robots.txt be used to protect sensitive content?

No — robots.txt is not a security mechanism. Many bots (including malicious scrapers) ignore robots.txt entirely. Sensitive content should be protected by server-level authentication, login requirements, or IP restrictions. Listing sensitive paths in robots.txt can actually advertise their existence to bad actors.

How do I test if my robots.txt is working?

Use Google Search Console's Robots.txt Tester (under Settings) to test whether specific URLs are allowed or blocked. You can also use the URL Inspection tool to check how Googlebot views a specific page. After making changes, monitor your Crawl Stats report to verify the impact on crawl activity.

No credit card required

Ready to monitor your full SEO health?

Serplight tracks keyword rankings, analyzes SERP competitors, and generates content strategies — all in one platform.