The news of Google’s recent firing of Timnit Gebru has taken the online world by storm. Gebru was the lead researcher for Gender Shades, a project that “evaluates the accuracy of AI-powered gender classification products.” In other words, her work was to see if artificial intelligence and machine learning could correctly predict a person’s gender, based on a photo of their face. And for the most part, it did.
Google, Microsoft, and the other big tech companies are investing a lot into machine learning and artificial intelligence. It’s improving search engine performance quite a bit.
Self-Updating Algorithms
Google engineers used to have to roll out big and little algorithm updates manually. But now, artificial intelligence capabilities empower the algorithm to update itself.
Enter Rankbrain.
Rankbrain is Google’s machine learning algorithm. Based on changes in all of the hundreds of contextual variables Google reads, the algorithm changes itself based on the shifts in the search landscape.
Of the trillions of queries Google sees every year, 15 per cent of them are entirely new. New words and phrases get created. The world is a vast place, so new songs, movies, books, political events, social trends, people, products, and ideas surface every year.
Rankbrain has the ability to read the contextual clues and serve users relevant results based on those clues.
Consider the following example. You just performed a search query with the words “orange soda”. This happens to represent all of the following things; A new, imaginary show entitled Orange Soda just came out on Netflix, The drink Orange Soda is gaining in popularity, There’s a restaurant in your city named Orange Soda, and There’s a town in Illinois called Orange Soda.
Which should Google show you information for? RankBrain is going to calculate the variables to see which contextual data points to emphasize. In our first example, you’re on your laptop in your house, logged into your Google Chrome account. It’s 8:30pm, and you just searched for TV shows to watch. Based on all of these clues, Google’s going to guess that you’re looking up the new show. It will give you search results accordingly.
In another example, you just left work at 5pm. Then you used voice search on your phone to ask, “What time does Orange Soda close?” Given these factors, and the fact that you’re only 2 miles from the Orange Soda restaurant, Google will serve up results about the restaurant, and not the TV show or drink.
Similarly, if you Google, “directions to orange soda” or “orange soda in bulk”, those keyword phrases will indicate which search results Google needs to show you.
Natural Language Processing
In the examples above, a person would instantly understand what’s being said by the context. Search engines are still learning this.
Even with contextual clues, search engines often struggle. The most recent search engine advances use analytics feedback and voice search data to train themselves on what people mean.
Voice search has been increasing steadily over the past few years. This gives Google and Bing larger data sets to train their machine learning models on. It gives the search engines practice with responding to human language phrases like, “What time is it in Miami?” and “Is there a lot of traffic on the I-5 right now?”
As the “natural language” database expands, search engines know how to better respond to these types of queries. This is because people speak differently than they write, and user feedback trains the models.
Customer journey analytics techniques enable software to react to people’s choices. Oftentimes a user’s intent doesn’t become clear until after a series of different strings, which helps to train the search engine on what people mean. For example, when users search a general string like, “seafood restaurants”, they may not like the results that they see.
If the search results are unsatisfactory, they may follow up with searches like, “seafood restaurants near me” or “restaurants with clam chowder near me”.
It’s these sequences of human language queries that give the search engines the data they need. After 100 times of seeing the search phrase “seafood restaurants” result in a subsequent search of “restaurants with clam chowder near me”, they will start to show one as an autosuggest phrase for the other.
And perhaps they will even start to show similar search results for both, because user data has indicated that the underlying search intent is the same, even if the wording is completely different.
The real-time, context-based nature of communication is what makes search technology so difficult, complex, and fascinating.
One of the biggest machine learning challenges in search right now is figuring out this “natural language processing”.
Images and Multimedia
Despite the advances that projects like Gender Shades bring, search engines still struggle to read images and other non-text media formats like video and audio. Computers still can’t consistently recognize images anywhere close to the level that humans can, and video is even more difficult for them to understand.
If and when artificial intelligence capabilities improve to a level comparable to the human brain, then the search engines can incorporate multimedia more fully into their search results.
Until then, the written word will continue to dominate search platforms.
Story by Garit Boothe