, , , , , , , ,

Google included some new features into its webmaster tools under one of the popular section called crawl errors.

To get an idea about the data, Google split these errors in to two types: Site errors and URL errors.

Site Errors: These errors are not for particular URL, it affect entire site to prevent Googlebot from accessing the site.

These Site errors include:  

  • DNS errors: Googlebot did not crawl URL because it couldn’t communicate with the DNS server. This could be because the server is down, or because there’s an issue with the DNS routing to domain.
  • Server Connectivity: Googlebot couldn’t access site because the request timed out or because site is blocking Google.
  • Fetching Robots.txt: Before Googlebot crawls site, it accesses robots.txt file to determine if site is locking Google from crawling any pages or URLs. If robots.txt file exists but it doesn’t return a 200 or 404 HTTP status code, it’ll postpone crawl rather than risk crawling disallowed URLs.

If the website didn’t have any errors in these areas it’ll just show us check marks to let us know everything is fine in Site errors.

URL Errors: These errors are specific to a particular page or particular URL. Google divided the URL errors into various categories based on what caused the error.

  • Not found: when someone requests a page that doesn’t exist, a server will return a 404 (not found) error.
  • Not followed: Some features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash may make it difficult for search engines to crawl site.
  • Access denied: robots.txt file is blocking Google from accessing your whole site or individual URLs or directories.
  • Server error: Googlebot couldn’t access URL, the request timed out, or site was busy. As a result, Googlebot was forced to abandon the request.
  • Soft 404: A “soft 404” means a page that has content that seems to be a 404 error message, but the page itself is being sent with a “200 OK” HTTP status.

Current User Interface only giving 1000 errors to download from Webmaster tools, but with GData API up to 100,000 URLs for each error and the details of most of what has gone missing.

If you click on any error url from the list of errors, a detailed panel will open with more information about that particular page, including last crawl details, when it noticed the problem, and a brief explanation of the error. Also we can see other pages that link to this URL and we can fix the error. Once fixed the issue we can test fix by fetching the URL as Googlebot. This will remove the error from your list.  In the future, the errors which are marked as fixed won’t be included in the top errors list, unless Googlebot encountered the same error when trying to re-crawl the URL.

The advantage with this new version of crawl errors feature, webmaster can focus on fixing most important errors first. Based on multiple factors Google ranked the errors so that those at the top of the priority list will be ones where there’s something we can do.