PAGE UNDER CONSTRUCTION

Computers versus Humans

ABOUT GOOGLE

Copyright ©2010 by T. Pavlidis

Text Retrieval - Looking for Stories on a Subject

Suppose you want to find an article about headhunters, groups of people who kill others and collect their heads. Unfortunately for you, the most common meaning of the word headhunter is that of a corporate recruiter, so that if you type "headhunter" in Google the returns will be for the corporate type. So you need to specify a bit more. You may try, for example, "headhunter amazon" because you have heard that headhunting was practiced amongst primitive people living in the Amazon rain forest. Google will return some items that are indeed close to what you are looking for but it will also return several irrelevant ones, for example about a music band called "Headhunters" whose albums are sold on amazon.com. Clearly, we must be careful in selecting the words that describe the subject we are interested in. For the current example "headhunter tribes" seems to be a reasonable choice.

Even if the word "headhunter" is used in a document in its literal meaning its occurrence need imply that the document deals with that subject. It may an incidental reference within another topic. One could use number of occurrences of a phrase in a document but that is not always reliable. Google relies on user feedback. Here is how it works.

Google returns links to all the web pages that contain the query words. Typically, there are thousands of them and the challenge is order them according to the likelihood of being of interest to the user. Google tracks user responses and uses them to order the returns accordingly. If more users have clicked on the link for page A than on the link for page B, then page A must be more relevant to the query than page B. (Clearly, this requires keeping a lot of data around. For each page Google has a list of the queries it matched and how often it has been chosen for viewing.)

User feedback also explains why the simple query "headhunter" returns at the top links to employment agencies. Far more people are looking for a job than for stories about primitive tribes!

User feedback is critical to the success of Google. Suppose we are looking for stories about a dog named Lucy. Typing "dog named Lucy" on Google is too restrictive because the word "named" may not appear in a story. And using all three words does not guarantee that we will not get what we want. When I did that on Google one of the stories that was returned contained the phrase:Lucy and I spent the weekend alone together. We have a dog named Kyler. Insisting on the exact phrase "dog named Lucy" does not produce any unwanted returns but it missed a lot of stories. It is better to try "dog Lucy". That did produce some unwanted returns (for example: "Ask Lucy - How to feed a senior dog") but the first 22 returns were all about a dog named Lucy!

A computer cannot search for stories with a given meaning but only for stories with given words. It is human intervention that enables a search for meaning. On one end is the construction of the query by the user and at the other end is the feedback from previous users of the system.

Looking for Pictures

It would be nice if can ask a computer to find all pictures similar to a given example. For example, I would like to find all pictures stored on my PC where my wife appears and I like to do that by showing the machine a picture of my wife. Such an operation is called Content-Based Image Retrieval (CBIR) and in spite of many claims to the opposite, it remains an unsolved problem. Today the only practical way to find pictures is though tags, text labels attached to the picture files.

Back to the Index Page

First Posted: May 11, 2010 — Latest Update: May 11, 2010