Google to explore alternatives to robots.txt

Posted by Edith MacLeod on 10 Jul, 2023
View comments SEO
Google calls for public discussion on new web standards and protocols in the wake of generative AI and other new tech.

Web controls.

Image: Tara Winstead on Pexels

Google has called for a public discussion on how AI and other new systems access and use content from websites. The call comes just weeks after it updated its privacy policy to include the use of publicly collected data to train AI models.

In a blog post this week, Google said the emergence of new technologies presented opportunities for the web community to evolve new standards and protocols which supported the web’s future development.

“We believe everyone benefits from a vibrant content ecosystem. Key to that is web publishers having choice and control over their content, and opportunities to derive value from participating in the web ecosystem.”

Existing web publisher controls such as robots.txt, which is the current way for publishers to control how search engines crawl their content, were developed nearly 30 years ago. This means it doesn’t have a mechanism to address how AI systems may use data, for example to train their algorithms or develop new products.

Google says the emergence of generative AI and new research use cases means it’s now time to explore alternative or additional ways of controlling crawling and indexing web content. 

“We believe it’s time for the web and AI communities to explore additional machine-readable means for web publisher choice and control for emerging AI and research use cases.”

There is increasing recognition that emerging AI technologies are leveraging web content in new ways, and this raises ethical considerations as regards data use and issues of privacy and bias.

In its blog post Google invites members of the web and AI communities to contribute to a discussion on approaches to complementary protocols.

“We’d like a broad range of voices from across web publishers, civil society, academia and more fields from around the world to join the discussion, and we will be convening those interested in participating over the coming months.”

Google says it will share more information about the process soon, but in the meantime anyone interested in joining the discussion can sign up to Google’s Web Controls mailing list.

Google using web data to train AI models

Google amended its privacy policy in early July to include the use of public data to train and improve the likes of Bard and other AI models.

The updated section under publicly accessible sources reads:

“Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

You can read more about the wider considerations, including around privacy, regulation and copyright, in this article on The Verge.

Recent articles

Google launches new personalisation options in Search
Posted by Edith MacLeod on 27 November 2023
Google adds small business filter to Search and Maps
Posted by Edith MacLeod on 21 November 2023
Google releases Nov 2023 reviews update
Posted by Edith MacLeod on 9 November 2023
Interactive content: engaging your audience in the digital age
Posted by Brian Shelton on 8 November 2023
Google releases November 2023 core update
Posted by Edith MacLeod on 3 November 2023