2.1. LOOKING FOR BOOKS ON AMAZON

When Two Books are Related (according to Amazon)

People who shop at amazon.com are familiar with suggestions such as "You might like these books …" (when you first visit the site) or "people who bought this book, also bought …" (when completing an order). This is all done by a program that keeps track of what books people buy and links books when bought by the same person as illustrated in Figure 2.1.1.

Figure 2.1.1: Linking people and books

In the example of the figure, two different customers bought both the first and the last book, so these two books are considered related. Of course the computer does not draw colored lines between customers and books but it does the equivalent by keeping arrays of data. In reality things are a bit more complicated because books are marked as related depending on the percentage of common sales rather than depending on absolute number of sales. Here is one way to do such a calculation.

Suppose we want to establish the relation between two books A and B. Let SALES(A) and SALES(B) be the number of total sales of each of these two books. Let SALES_COMMON be the number of people that bought both books. In order to express SALES_COMMON as a percentage we need to divide it by a number combining the individual sales. Reportedly, Amazon uses the geometric mean of the individual sales, the square root of the product SALES(A) times SALES(B). We can write a formula for the relationship REL(A,B) of the two books as

REL(A,B)  =    SALES_COMMON
SQRT[(SALES(A)*SALES(B)]
 (2.1.1)

Taking the geometric mean rather than the average provides a better balance when one of the books has much bigger sales than the other. For example, suppose book A has sold 10,000 copies and book B only 60 but each person who bought book B has also bought book A, so the common sales are 60. The geometric mean of 10,000 and 60 is the square root of 600,000 or about 775, so that the relationship number will be 60/775 or about 0.077. The average sales number is 10060/2 or 5030 and that yields 0.012 for the relationship, a much smaller number. See Section 2.X for more on this formula.

There are several other methods for making recommendations on the basis of customer preferences and they are referred to as Collaborative Filtering. The "Collaborative" refers to the use of several kinds of information that are available from earlier customer actions. It is to be contrasted with Content-based Filtering that relies on information about the content of various iterms, books, movies, music, etc. Content-based filtering requires human effort to add labels to the computer record of each item. Because human labor is more expensive content-based filtering is used by few merchants.

When you shop at Amazon the important thing to remember is that customer actions rather than any content analysis determines, in the eyes of the seller, whether two books are on related subject or not.

Here is a story that illustrates how such methods may fail. I am a member of a reading group that decided on successive months to read two books that were unrelated to each other and not particular popular. As a results the sales to our group were a significant part of the overall sales of the two books and that led amazon.com to recommend each book as related to the other.

Back to Contents --- Next Section