[ View menu ]

What is Robots.txt ?

Written on August 8, 2006

What is the Robots Text File?
Robots.txt is used to control search engine spiders’ access to pages on your site.

Why would you want to do this?
You may have created a personnel page for company employees that you don’t want listed. Some webmasters use it to exclude their guest book pages so to avoid people spamming. There are many different reasons to use the robots text file.

How do I use it?
You need to upload it to the root of your web site or it will not work - if you don’t have access to the root then you will need to use a Meta tag to disallow access. You need to include both the user agent and a file or folder to disallow.

What does it look like?
It’s a basic “Notepad” type .txt file named “robots”
The basic syntax is
User-agent: spiders name here
Disallow:/ filename here
If you use
User-agent: *
The * acts as a wildcard and disallows all spiders. You may want to use this to stop search engines listing unfinished pages.
To disallow an entire directory use
Disallow:/mydirectory/
To disallow an individual file use
Disallow:/file.htm
You have to use a separate line for each disallow. You cannot you for example use
Disallow:/file1.htm,file2.html
You should use
Use-agent/*
Disallow:/file1.htm
Disallow:/file2.htm

For a list of spider names visit
http://www.robotstxt.org/wc/active/html/
Make sure you use the right syntax if you don’t it will not work. You can check you syntax here http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

Filed in: Guides, On The Web.

WordPress database error: [Can't open file: 'wp_comments.MYI'. (errno: 144)]
SELECT * FROM wp_comments WHERE comment_post_ID = '32' AND comment_approved = '1' ORDER BY comment_date

No Comments

Write comment - TrackBack - RSS Comments

Write comment