Robots.txt is a regular text file, and this file has special meaning to the majority of “honorable” robots on the web.

Introduction to “robots.txt”

 If a site owner wants to disallow to web robots from specific directory then he must place a text file called robots.txt to the root of the web site hierarchy (e.g. www.example.com/robots.txt).

By defining few rules in this robots.txt file, you can instruct web robots to not crawl and index certain files, directories within your site, or the whole site. In short a robots.txt file on a website will function as a request that ignore specified files or directories in their search.

Now that you know what is robots.txt file,you need to learn what to actually put in it to send commands off to search engines that follow this protocol (formally the “Robots Exclusion Protocol“). format is simple enough for most intents and purposes:

How to create “robots.txt” file

USERAGENT line to recognise the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts or directory of your site.

following example allows all robots to visit all files because the wildcard “*” specifies all robots :

User-agent: *

Disallow:

And following  example keeps all robots out:

User-agent: *
Disallow: /

The next is an code will disallow all crawlers not to enter three directories of a website:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /private/