Advertising
robots.txt Protocol
The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt which should be placed in the top-level directory (root directory) of a website.

The protocol is purely advisory. It relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught trying to use the robots file to make private parts of a website invisible to the rest of the world. However, the file is necessarily publicly available and is easily checked by anyone with a web browser.

Overall it is a good idea to have a robots.txt file as all the major search engines look for one when their spiders, bots and crwalers arrive at your site. If you are in the habit of checking you server logs you can reduce the number of 404 not found errors which are generated by spiders that have visited your site but are unable to find a robots.txt file!

Creating a robots.txt File

A robot.txt file can be created simply using a plain text editor such as Notepad. Create a new text file and save it as robots.txt. The format for indicating which directories and files should NOT be indexed is:

User-Agent: Spider or robot name
Disallow: Directory or File Name

This could be repeated for each directory or file you want to prevent from being indexed, and for each spider or robot you want to exclude but there are a couple of shorthand methods available.

Examples

This example allows all robots to visit all files because the wildcard "*" specifies all robots.

User-agent: *
Disallow:

This example keeps all robots out:


User-agent: *
Disallow: /

The next is an example that tells all crawlers not to enter into four directories of a website:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

Example that tells a specific crawler not to enter one specific directory:

User-agent: BadBot
Disallow: /private/

Example for the default installation of PHP Fusion:

User-agent: *
Disallow: /administration/
Disallow: /images/
Disallow: /locale/
Disallow: /themes/
Disallow: /blank_config.php
Disallow: /config.php
Disallow: /edit_profile.php
Disallow: /footer.php
Disallow: /maincore.php
Disallow: /members.php
Disallow: /setuser.php
Disallow: /side_left.php
Disallow: /side_right.php
Disallow: /subheader.php

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Robots.txt".
Ratings
Rating is available to Members only.

Please login or register to vote.

Awesome! Awesome! 0% [No Votes]
Very Good Very Good 100% [1 Vote]
Good Good 0% [No Votes]
Average Average 0% [No Votes]
Poor Poor 0% [No Votes]
Cheap Web Hosting
Here at The Webmaster's Bureau we understand the need to maintain the balance between low costs, acceptable levels of service and hosting account features. Whether your requirements are very basic or you need scalable web space with some serious oomph, at £2.99 a month, price is simply not an issue. » Web Hosting Details
Domain Name Search
Search
Login
Navigation
Home
Articles
Downloads
Forum
News
Links
Contact Us
Privacy
FAQ's
Themes (Templates)
Switch to:

More about: Themes...

Latest Downloads
Priamos Web Applicat... 1937
Night City Theme 2376
TWB Mailbox 1826
TWB Web Links 1444
TWB Articles 1818
Latest Articles
robots.txt Protocol
Adsense Placement
Google Sitemaps Expl...
SEO Tags Mod
PHP Fusion Themes (T...
Recent Projects
Auctionfeed
Priamos: Web Solutions
Quit Smoking - STOP!