March 04, 2017

2 comments


What is robots.txt?



Robots.txt file is a standard used by websites to specify web crawlers which areas of the site should not be crawled. Here are basic examples of a robots.txt setups:-


If you want to allow full access to your site:


User-agent: *
Disallow:

If you want to block access to your whole site:

User-agent: *
Disallow: /

If you want to block a folder:

User-agent: *
Disallow: /folder/
You have to add robots.txt file to the root folder of your domain:

www.example.com/robots.txt



Since this is a file that contains important instructions for the web crawlers, it is a must for the crawlers to first visit this page and then rest of the site.



Do make a note of this - If Google bots can't crawl your robots.txt file, it would not crawl your site. If your robots.txt file doesn't return a 200 or 404 response code, Google bots won't be able to crawl your robots.txt file and hence they won't crawl your site.


This is what Google's Eric Kuan once said on Google Webmaster Help forum:



If Google is having trouble crawling your robots.txt file, it will stop crawling the rest of your site to prevent it from crawling pages that have been blocked by the robots.txt file. If this isn't happening frequently, then it's probably a one off issue you won't need to worry about. If it's happening frequently or if you're worried, you should consider contacting your hosting or service provider to see if they encountered any issues on the date that you saw the crawl error.


Even Gary Illyes from Google recently confirmed the same on Twitter:


And here are few interesting questions on Twitter and helpful replies from Gary: 
 



Gary Illyes on robots.txt
- Tejas Thakkar


share this article to: Google+ Whatsapp
Do you enjoy our Article! and need to share to your love once offline, whatsapp, Facebook and also store in your phone etc


Go back to the previous page
Published by Unknown,at 01:24 and have 2 comments

2 comments:

Drop your questions and get quickly reply