In an update to its Googlebot Help document, Google has specified that Googlebot will only crawl and index the first 15MB of an HTML file or supported text-based file.
The update caused some head scratching among SEOs. For example, would images count towards the size limit, meaning text below images which had reached the limit just be ignored?
In response, Google’s John Mueller tweeted on 24 June to clarify that embedded resources or content with IMG tags would not count as part of the HTML file.
John Mueller also confirmed that this is not a change, just official documentation of an already existing policy.
SEO best practice
Google has now put on the record what the crawl cutoff is for Googlebot. 15MB is a large amount, however, so there’s no need for undue worry.
It’s good practice (also editorially) to place important content at the top of the page to ensure it’s not missed, so Google can rank your page appropriately.
It’s also a good idea to keep your web pages light. This is better both for users, who will just move on if your page takes too long to load, and for crawlers such as Googlebot.
You can check your HTML page size with free tools such as sitechecker, and you can use the URL Inspection tool in Search Console to see which parts of the page Google renders and sees within the debugging tool.
Update: In light of the confusion caused by this documentation of the crawl limit, Google published a blog post clarifying the content the 15MB limit applies to.
The post reiterates that, with the existing median size for an HTML file being 30KB, the overwhelming majority of users will not be affected by this crawl limit. Google adds:
"However, if you are the owner of an HTML page that's over 15 MB, perhaps you could at least move some inline scripts and CSS dust to external files, pretty please."
Read full details in Google's Search Central blog.