Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain text file in your website's root directory that tells search engine crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol, respected by Googlebot, Bingbot, and other well-behaved crawlers. It's used to manage crawl budget, block admin pages, and prevent indexing of low-value URLs.

Question 2

Where should I put my robots.txt file?

Accepted Answer

The robots.txt file must be placed in the root directory of your domain at https://yourdomain.com/robots.txt. It cannot be in a subdirectory - a file at /blog/robots.txt will not be found by crawlers. Most CMS platforms (WordPress, Shopify) handle this automatically.

Question 3

Does robots.txt prevent pages from being indexed?

Accepted Answer

No - robots.txt controls crawl access, not indexation. A blocked page cannot be crawled, but Google can still index it if other websites link to it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag HTTP header instead of (or in addition to) a Disallow rule.

Question 4

What is Disallow: / in robots.txt?

Accepted Answer

"Disallow: /" blocks all crawlers from accessing any page on your website. This is the most extreme robots.txt configuration and is commonly used on staging environments to prevent Google from indexing test content. Never deploy "Disallow: /" on a live production site.

Question 5

How do I block specific bots in robots.txt?

Accepted Answer

Use a specific User-agent name followed by Disallow directives. For example, to block GPTBot (OpenAI's crawler): "User-agent: GPTBot" on one line and "Disallow: /" on the next. You can find the official User-agent names for specific bots in their documentation.

Question 6

Should I include my sitemap in robots.txt?

Accepted Answer

Yes - adding "Sitemap: https://yourdomain.com/sitemap.xml" at the end of your robots.txt helps search engines discover all your indexable pages. It's especially helpful for large sites where crawlers might otherwise miss recently published content.

Question 7

Can robots.txt be used to protect sensitive content?

Accepted Answer

No - robots.txt is not a security mechanism. Many bots (including malicious scrapers) ignore robots.txt entirely. Sensitive content should be protected by server-level authentication, login requirements, or IP restrictions. Listing sensitive paths in robots.txt can actually advertise their existence to bad actors.

Question 8

How do I test if my robots.txt is working?

Accepted Answer

Use Google Search Console's Robots.txt Tester (under Settings) to test whether specific URLs are allowed or blocked. You can also use the URL Inspection tool to check how Googlebot views a specific page. After making changes, monitor your Crawl Stats report to verify the impact on crawl activity.

Robots.txt Generator

Generated robots.txt

From directives to crawler control.

Choose a Preset or Build Custom Rules

Configure User-agents and Directives

Download or Copy Your File

Frequently asked questions.

More SEO tools.

Entity Extractor

Topical Map Generator

Ready to monitor your full SEO health?