All about robots.txt

If you want to find out (or know more) about the mysterious robots.txt and web robots then The Web Robots Pages will definitely be of help. The author Martijn Koster has generously written a huge FAQ and also gathered some code examples, thorough information and references.

Sometimes people find they have been indexed by an indexing robot, or that a resource discovery robot has visited part of a site that for some reason shouldn’t be visited by robots.

In recognition of this problem, many Web Robots offer facilities for Web site administrators and content providers to limit what the robot does. This is achieved through two mechanisms:

The Robots Exclusion Protocol

A Web site administrator can indicate which parts of the site should not be visited by a robot, by providing a specially formatted file on their site, in example.com/robots.txt. More information on this method is found here:

The Robots META tag

A Web author can indicate if a page may or may not be indexed, or analysed for links, through the use of a special HTML META tag. Full details on how this tags works is provided:

If you just want to include a robots.txt file allowing anyone to crawl your site add the following and upload to your root directory (example.com/robots.txt):

User-agent: *
Disallow:

If you happen to know of one or several web robots worth keeping out please let me know…

Comments are closed.