Multisearch is the ability to combine both text and images together. Importantly it allows you to refine image searches with keywords, so taking the initial image result and requisition variations or refining down the search via text.
This relies on enormous amounts of processing power from Google’s end in order to apply an AI-driven categorization of images to understand them via text. It’s connecting the two together, which is really easy for humans, but really, really hard for computers.
Ask anyone to describe an object and it’s easy to highlight the basic parameters and properties of it. But for a computer to do the same is a much more tricky task. For instance, identifying the color, or indeed identifying the object which is the focus of the question. Show someone a picture of a red box on a table and ask them ‘what color is the box’ and it’s rather obvious you mean red. There is a huge amount of inferred logic in that question. The computer first needs to work out which is the box and which is the table, etc. We are working from frames of reference developed over our lifetimes that allow us to infer information to answer these types of questions easily.
Google’s recent announcement, as part of their I/O 2022 conference, is about bridging this gap between the way we and computers perceive the same scene to derive sets of information.
Though typing words into a search box has become second nature for many of us, it’s far from the most natural way to express what we need. For example, if I’m walking down the street and see an interesting tree, I might point to it and ask a friend what species it is and if they know of any nearby nurseries that might sell seeds. If I were to express that question to a search engine just a few years ago… well, it would have taken a lot of queries.
(Google) Senior Vice President
In other words, they want to expand search beyond the search box. Something that’s been in Google’s sights for a long time. They want us to interact with the world in more ways than just through searching with words in the box.
Today, we're redefining Google Search yet again, combining our understanding of all types of information — text, voice, visual and more — so you can find helpful information about whatever you see, hear and experience, in whichever ways are most intuitive to you
This is in part due to the way we are changing how we interact with the world around us, such as the rise in mobile searches. Through this Google is beefing out what multisearch is able to do, such as adding in the ability to localise the results:
When you use multisearch to find it near you, Google scans millions of images and reviews posted on web pages, and from our community of Maps contributors, to find results about nearby spots that offer the dish so you can go enjoy it for yourself.
This has yet to be launched, but it’s coming later this year according to Google.
That’s not all that’s been added though, with ‘scene exploration’ also being promised. This is utilising Augmented Reality to overlay information over a scene, which will update in real time as you move your phone around the scene.
Whenever these changes are with us, we can be sure Google is going to keep developing and changing and we'll be seeing even more initiatives from them in the near future which help us search beyond the box.