Don’t Let Your Robots.txt Block Your Site From Webmaster ToolsBy Peter Ehat on October 8th, 2014
I’ve heard a few stories recently of people who show reports in Google Webmaster Tools that their entire site is blocking the Google spider (Googlebot). That’s not good because if your site is blocking Googlebot it’s going to be very difficult to get ranked—in fact it will be impossible because Google will not index any of your site’s pages if your site can’t be crawled.
“Denied by robots.txt” Error
Some people are reporting that they see an error in Webmaster Tools when trying to submit their site for indexing, or even showing that many, or all, pages on their site are being blocked by robots.txt. It’s very important to get your robots.txt correct. It’s better to not have one at all than to have it misconfigured!
How to Fix the Issue
The simplest way to ensure that Googlebot is crawling your site is to check the root of your site (the directory on your server where your site files are publicly viewable—usually something like httpdocs or public_html or www) for the robots.txt file. If you have one there, great! If you don’t have one there, go ahead and create one.
If you don’t know how to find the root directory or folder for your website, you might consider searching the web for more information specific to your hosting environment, calling your website hosting provider, or even watching a video (albeit a little technical) on how to find it.
What Should Be in My robots.txt File?
If you’re like me, you’re going to want to keep things pretty simple. There are typically only a few files and directories in your root folder that you don’t want indexed. If you don’t want it public (and searchable on Google) you might consider placing it elsewhere on your server, or behind a login.
If you want to expose your entire site to Google (recommended), write the following (and only the following) in your robots.txt file:
How to Allow Googlebot Indexing on My Entire Site
Cool, huh? It’s that easy.
How to Allow Googlebot Indexing on My Entire Site, but Disallow Two Directories
You could also let Googlebot assume that you want to allow indexing on everything, and only disallow a few directories. In that case you might including the following in your robots.txt file:
Because we have not declared anything for “Allow”, Googlebot assumes you want everything indexed. Leaving the Allow directive out is just as good as declaring “Allow: /”. The Disallow directive here tells Googlebot to skip indexing on the wp-admin folder within my site’s root folder (which might be helpful for WordPress users). The second Disallow directive tells Googlebot to also skip indexing on the “hidden” directory in my site’s root.
Test it Out
Make sure you test your robots.txt in Google Webmaster Tools. You can do this using their nifty and simple testing tool: https://www.google.com/webmasters/tools/robots-testing-tool. Make sure your results show zero errors and zero warnings. Update any problems returned by this tool and you’ll be in great shape.
Help, I Still Have Errors!
Do you still have errors in Webmaster Tools preventing you from getting your site indexed? There could be something more complex going on that we can probably help with. Contact us today and we’ll look into it and let you know if we can help!