In this article, you will understand the importance of
robot.txt file and how to configure it properly to guide search engine bots to index and crawl your website.
As we all know that any website appears in search engine result if it is indexed by that search engine. Bots from various search engine crawl website to index the URLs so that the results get available in search result. By default bots crawl whole website. You can instruct such bots using
robots.txt file. Let us understand what exactly is
robots.txt file and how to configure it.
robots.txt file is a simple text file which is placed at the root directory of web server as shown:-
This file tells web crawlers like Googlebot which all files/directories they should access & which all not. It is important to place robots.txt file at root directory as most of the search engine search it on root directory not at other places of your website.
So, if not placed at right place, search engines will index your whole website.
Let us understand structure of robots.txt file:
Let us see some examples how to construct
User-agent: * Disallow:
User-agent: * Disallow: /
User-agent: * Disallow: /comingsoon.html
User-agent: * Disallow: /testing/
User-agent: Googlebot Disallow: /testing//google/ Disallow: /*.jpeg$
Some of the well-known crawlers are listed below:
www.yourdomain.com/robots.txtaddress bar and check whether it is available.
robots.txtand save it.
In this article, you learned what is
robots.txt file, how to create it, how to configure it and its usage to guide web crawlers.