Free SEO Consultancy, SEO Basic, SEO Benefits: Robots txt file

"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all.

There are 3 types of robots namely

robots.txt
Meta Robots
Nofollow Tag

robots.txt

The basic robot file is the following file

User-agent: *
Disallow: /

* Represents all search engine
Disallow: / represents which of the part is not to be crawled by the search engine.

User-Agent: googlebot
Disallow: /images/

This is for google search engine and to block the folder "images" from crawling.

Disallow: *.doc$

This is to block the word document

Sitemap: http://example.com/mainsitemap.xml

This is to ensure the sitemap.xml file in the web.

Allow: /research/findings/*

Allow tag allows the search engine to crawl

Meta Robots

It is possible to instruct the robots not to crawl a single page.

In the meta tag, add the attribute as name=robots and content="noindex, nofollow".

Nofollow Tag

In some cases, we might have given the some site URL as the reference in our site. So, by giving the URL of other site, it is an upgrade to that site. So, to avoid this "rel=nofollow" attribute in anchor tag is used to prevent the site from crawling.

Free SEO Consultancy, SEO Basic, SEO Benefits

Nov 6, 2008

Robots txt file

No comments: