What Is a Robots.txt File and Why Does It Matter?
A robots.txt file is a plain text file placed in the root directory of your website that tells search engine crawlers which pages and directories they can and cannot access. It follows the Robots Exclusion Protocol (REP) — a standard that well-behaved bots like Googlebot, Bingbot, and others respect. The file is one of the first things search engines check when they visit a new website.
Robots.txt is a crawl budget management tool. Googlebot has a limited crawl budget for each website — the number of pages it will crawl in a given period. By blocking low-value URLs (admin pages, duplicate content, internal search results, thank-you pages), you direct crawl budget toward the pages that actually matter for SEO. For large websites with thousands of pages, this can meaningfully improve how quickly new content gets indexed.
Critically, robots.txt controls crawl access — not indexation. A page blocked by robots.txt will not be crawled, but it can still appear in search results if other sites link to it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag header instead. For most sites, the correct approach is: use robots.txt to block crawling of low-value URLs, and use noindex to prevent indexation of specific pages.