How to configure Robots.txt file

20/12/2015 Amit Gupta 2455 SEO

In this article, you will understand the importance of robot.txt file and how to configure it properly to guide search engine bots to index and crawl your website.

 

Robot.txt file

As we all know that any website appears in search engine result if it is indexed by that search engine. Bots from various search engine crawl website to index the URLs so that the results get available in search result. By default bots crawl whole website. You can instruct such bots using robots.txt file. Let us understand what exactly is robots.txt file and how to configure it.

Robots.txt file

The robots.txt file is a simple text file which is placed at the root directory of web server as shown:-

http://www.modernpathshala.com/robots.txt 

This file tells web crawlers like Googlebot which all files/directories they should access & which all not. It is important to place robots.txt file at root directory as most of the search engine search it on root directory not at other places of your website.

So, if not placed at right place, search engines will index your whole website.

Let us understand structure of robots.txt file:

  1. User-Agent:- This directive is used to define the search engine to which rules applies to.
  2. Disallow:- This directive advises a search engine not to crawl & index a file, page or directory.
  3. Allow:- This directive is used to specify pages/directories to be crawled.

Let us see some examples how to construct robots.txt file.

Allow full website access

User-agent: *
Disallow: 

Disallow full website access

User-agent: *
Disallow: /

Block one particular file

User-agent: *
Disallow: /comingsoon.html

Block one particular directory

User-agent: *
Disallow: /testing/

Block a specific web crawler from a specific folder

User-agent: Googlebot
Disallow: /testing//google/
Disallow: /*.jpeg$

WebCrawlers

Some of the well-known crawlers are listed below:

  1. Googlebot (Google)
  2. Googlebot-Image (Google Image Search)
  3. MSNBot (MSN)
  4. Slurp (Yahoo)
  5. Bingbot (Bing)
  6. Teoma (Ask)

Steps to create robots.txt file

  1. To check whether your website contains robots.txt file enter www.yourdomain.com/robots.txt address bar and check whether it is available.
  2. If the file is not available then create a simple file named as robots.txt and save it. 
  3. If you want to configure same rules for all crawlers set User-agent: * 
  4. Else you can specify specific crawler as User-agent: Teoma
  5. Now if you want to disallow any folder or file to be crawled by crawler specify it using Disallow directive.
  6. Place the robots.txt file at the root directory of your webserver.

In this article, you learned what is robots.txt file, how to create it, how to configure it and its usage to guide web crawlers.

Article tagged as
SEO
Author
Author: Amit Gupta
Published On: 20/12/2015
Last revised On: 21/12/2015
View all articles by Amit Gupta

Share this post

Comments

Comments
comments powered by Disqus

Navigation

Social Media