Googlebot only crawls and indexes first 15MB of HTML content on page

Posted by Edith MacLeod on 27 Jun, 2022
View comments SEO
Google has documented an existing policy. What are the SEO takeaways?

Googlebot.

In an update to its Googlebot Help document, Google has specified that Googlebot will only crawl and index the first 15MB of an HTML file or supported text-based file.

"Googlebot can crawl the first 15MB of an HTML file or supported text-based file. Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately. After the first 15MB of the file, Googlebot stops crawling and only considers the first 15MB of the file for indexing. The file size limit is applied on the uncompressed data. Other crawlers may have different limits."

The update caused some head scratching among SEOs. For example, would images count towards the size limit, meaning text below images which had reached the limit just be ignored?

In response, Google’s John Mueller tweeted on 24 June to clarify that embedded resources or content with IMG tags would not count as part of the HTML file.

John Mueller tweet.

John Mueller also confirmed that this is not a change, just official documentation of an already existing policy.

John Mueller 2.

SEO best practice

Google has now put on the record what the crawl cutoff is for Googlebot. 15MB is a large amount, however, so there’s no need for undue worry.

It’s good practice (also editorially) to place important content at the top of the page to ensure it’s not missed, so Google can rank your page appropriately.

It’s also a good idea to keep your web pages light. This is better both for users, who will just move on if your page takes too long to load, and for crawlers such as Googlebot. 

You can check your HTML page size with free tools such as sitechecker, and you can use the URL Inspection tool in Search Console to see which parts of the page Google renders and sees within the debugging tool.

Update: In light of the confusion caused by this documentation of the crawl limit, Google published a blog post clarifying the content the 15MB limit applies to.

The post reiterates that, with the existing median size for an HTML file being 30KB, the overwhelming majority of users will not be affected by this crawl limit.  Google adds:

"However, if you are the owner of an HTML page that's over 15 MB, perhaps you could at least move some inline scripts and CSS dust to external files, pretty please."

Read full details in Google's Search Central blog.

Recent articles

New AI video capabilities for Ads and YouTube
Posted by Edith MacLeod on 24 September 2024
Why web hosting matters - optimising UX and Core Web Vitals
Posted by Daniel Watkinson on 18 September 2024
Google introduces confidential matching
Posted by Edith MacLeod on 15 September 2024
August update: Google’s Danny Sullivan says recovery is not guaranteed
Posted by Edith MacLeod on 10 September 2024
Ecommerce strategies, trends and best practice for holiday shopping 2024
Posted by Edith MacLeod on 9 September 2024