Google to explore alternatives to robots.txt

Web controls.

Google has called for a public discussion on how AI and other new systems access and use content from websites. The call comes just weeks after it updated its privacy policy to include the use of publicly collected data to train AI models.

In a blog post this week, Google said the emergence of new technologies presented opportunities for the web community to evolve new standards and protocols which supported the web’s future development.

“We believe everyone benefits from a vibrant content ecosystem. Key to that is web publishers having choice and control over their content, and opportunities to derive value from participating in the web ecosystem.”

Existing web publisher controls such as robots.txt, which is the current way for publishers to control how search engines crawl their content, were developed nearly 30 years ago. This means it doesn’t have a mechanism to address how AI systems may use data, for example to train their algorithms or develop new products.

Google says the emergence of generative AI and new research use cases means it’s now time to explore alternative or additional ways of controlling crawling and indexing web content.

“We believe it’s time for the web and AI communities to explore additional machine-readable means for web publisher choice and control for emerging AI and research use cases.”

There is increasing recognition that emerging AI technologies are leveraging web content in new ways, and this raises ethical considerations as regards data use and issues of privacy and bias.

In its blog post Google invites members of the web and AI communities to contribute to a discussion on approaches to complementary protocols.

“We’d like a broad range of voices from across web publishers, civil society, academia and more fields from around the world to join the discussion, and we will be convening those interested in participating over the coming months.”

Google says it will share more information about the process soon, but in the meantime anyone interested in joining the discussion can sign up to Google’s Web Controls mailing list.

Google using web data to train AI models

Google amended its privacy policy in early July to include the use of public data to train and improve the likes of Bard and other AI models.

The updated section under publicly accessible sources reads:

“Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

You can read more about the wider considerations, including around privacy, regulation and copyright, in this article on The Verge.

Wordtracker

Google to explore alternatives to robots.txt

Google using web data to train AI models

Recent articles

Keyword intent: how to attract the right traffic and drive conversions

Google's March 2025 core update

How to improve email deliverability

Effective strategies to recover from a Google algorithm penalty

Google’s December 2024 spam update