Thursday, April 19, 2007

How to use Robots.txt

Robots.txt:

Robots.txt is a file used to prevent robots visiting your site. The main purpose to create robots.txt to instruct crawler which pages is not to be crawled. Before a website is indexed by any search engine's crawler, the "robots.txt" is first retrieved from the website's document root. Robots.txt downloaded once in a day.

Syntax for Robots.txt:

User-agent: bot name
Disallow: /file name

Examples:
Disallow: /folder*/ will block all the subdirectories begin with folder.
Disallow: /*&* will block all the url’s that include &.


If you want to block url’s by matching the end characters then you have to use $.
Example:
Disallow: /*.asp$ will block all the url’s that ends with .asp.

Labels: , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home