If you search on Google, What is Crawl Budget, you will see many definitions. We understand, there is a confusion with Crawl Budget and the good news is, Google thinks the same.
In a recent post, Gary Illyes from Google clears the confusion. He explains what crawl budget is, how crawl rate limits work, what crawl demand is and what factors impact a site’s crawl budget.
First of all, not all the publishers have to worry about it. If you want Google to crawl your new page same day, crawl budget is not something webmasters need to focus on.
What is Crawl rate limit in Crawl Budget?
Googlebot is designed for crawling. But, it is also necessary not to degrade the user experience of the visitors of that site.
So Google limits the maximum fetching rate for a given rate. This is known as “Crawl rate limit”.
Crawl rate can fluctuate based on some factors. Few factors are below:
- Crawl Health: if the site responds fast for a while, the limit goes up. Which means, Googlebot can use more connections to crawl. If the site slows down or responds with server errors, the limit goes down and Googlebot crawls less.
- Limit set in Search Console: You can reduce Googlebot’s crawling of their site.
What is Crawl Demand?
If there is no demand from indexing, you will see less activity by Googlebot, even if the crawl rate limit is not reached. The two factors that play a significant role in determining crawl demand are:
- Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in our index.
- Staleness: our systems attempt to prevent URLs from becoming stale in the index.
Also, site-wide events like site moves may trigger an increase in crawl demand in order to reindex the content under the new URLs.
So if you put crawl rate and crawl demand together, Crawl Budget is the number of URLs Googlebot can and wants to crawl.
Factors affecting crawl budget
As per Google, having many low values add URLs can impact site’s crawling and indexing in a negative way.
Here are few things that impact site’s crawling and indexing in a negative way.
- Faceted navigation and session identifiers
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
To know, how to optimize crawling of your site, take a look at Google’s blogspost on optimizing crawling from 2009. This is still applicable.