2.X. MATHEMATICAL ASIDES

More on the Method Used by Amazon

The Amazon formula for determing the relation between two books is actually more general than the form given in Section 2.1 and we describe that here.

For each item (book, DVD, etc) construct a vector UA that has as many components as users and each component equals the rating given by that user. If no rating is given the entry is 0. Then the relationship between two items is expressed as the cosine angle between the two vectors:

COS(A,B)  =    <UA , UB>
||UA||*||UB||
(2.X.1)

The nominator is the scalar product of the two vectors, namely the sum of the products of their respective components. If the components are either 0 (user has no bought the item) or 1 (user has bought the item), then the sum equals the number of terms where both components are 1, in other words the number of users who bought both items. ||UA|| is the norm of the vector UA and that equals the square root of the sum of the squares of each term. Again when the terms are either 0 or 1 the norm is the square root of the total sales for item A and in this way Equation (2.X.1) reduces to Equation (2.1.1).

Using Relative Ratings

Let R(i,j) denote the rating given to item i by user j. Let also Rav(j) denote the avearge rating by user j. Then we can predict the rating for user A for item M by the following expression

R(M, A)  =    Rav(A)  +  1
K
Σ[R(M,j) - Rav(j)] (2.X.2)

The sum is taken over all users j that are K-nearest neighbors to A. Of course, one could apply the regression line method on the relative ratings of the K-nearest neighbors.

Finding the Regression Line

If we denote the data points as {xi ,yi }, then the coeffients of the line s (slope) and b (y-interscept) are those that minimize the integral square error:

Σ(yi - sxi - b)2 (2.X.3)

Back to Contents --- Previous Section