The Mathematics of Google Page Rank

Have you ever wondered how web pages are ranked in searches? Modern search engines employ methods of ranking the results to provide the “best” results first that are more elaborate than just plain text ranking, and each search engine may use a different algorithm. One of the most known and influential algorithms for computing the relevance of web pages is the Page Rank algorithm used by the Google search engine. It was invented by Larry Page and Sergey Brin while they were graduate students at Stanford, and it became a Google trademark in 1998. Page Rank ranks its pages using an algorithm primarily based on the probability of a web surfer randomly landing on a page.

Imagine four different web pages: page1.html, page2.html, page3.html, and page4.html. Each page will have an importance score based on how well it links to other pages using in-degree and Page Rank calculations. Generally speaking, each page with k outgoing links will transfer \frac{1}{k} of its importance to each of the nodes the page is linked to.

picture1

The four webpages shown in the webgraph above will be labeled as such:

  • A – page1.html. A has 3 outbound links to B, C, and D. This is node 1.
  • B – page2.html. B has 2 outbound links to C and D. This is node 2.
  • C – page3.html. C has 1 outbound link to A . This is node 3.
  • D – page4.html. D has 2 outbound links to A and C. This is node 4.

The outbound links transfer their importance from each node. In this example, node 1, or A, or page1.html, will pass on \frac{1}{3} of its importance to its three linked nodes B, C, and D. Node 2, or B, or page2.html will pass \frac{1}{2} of its importance to its two linked nodes, C and D. Node 3, or C, or page3.html will only pass a unit of 1 to its sole link to A. Node 4, or D, or page4.html, will pass \frac{1}{2} of its importance to A and C.

From a linear algebra point of view, this would look like the following:

\left\{ \begin{array}{l} x_1 = x_3 +\frac{1}{2} x_4 \\ x_2 = \frac{1}{3} x_1 \\ x_3 = \frac{1}{3} x_1 + \frac{1}{2} x_2 + \frac{1}{2} x_4 \\ x_4 = \frac{1}{3} x_1 + \frac{1}{2} x_2 \end{array} \right.

The Eigenvectors are of the form c \left| \begin{array}{c} 12 \\ 4 \\ 9 \\ 6 \end{array} \right |

We choose v_e to be the unique Eigenvector with the sum of all entries equal to 1, which will lead to our PageRank vector:

frac{1}{31} \left| \begin{array}{c} 12 \\ 4 \\ 9 \\ 6 \end{array} \right .

This yields a matrix A that looks like the following:

\left| \begin{array}{cccc} 0 0 1 \frac{1}{2} \\ \frac{1}{3}0 0 0 \\ \frac{1}{3} \frac{1}{2}  0 \frac{1}{2}\\ \frac{1}{3} \frac{1}{2} 0 0 \end{array} \right|

Suppose the initial importance of each node v is equally distributed amongst the 4 nodes, with each getting \frac{1}{4}. From there we update the rank according to the current importance value that we just calculated for A .

v=\left| \begin{array}{c} 0.25 \\ 0.25 \\ 0.25 \\ 0.25 \end{array}\right | , Av = \left| \begin{array}{c} 0.37 \\ 0.08 \\ 0.33 \\ 0.20 \end{array}\right |

v=\left| \begin{array}{c} 0.37 \\ 0.08 \\ 0.33 \\ 0.20 \end{array}\right | , A^2 v = \left| \begin{array}{c} 0.43\\ 0.12 \\ 0.27 \\ 0.16 \end{array}\right |

A^3 v = \left| \begin{array}{c} 0.35\\ 0.14 \\ 0.29 \\ 0.20 \end{array}\right |

A^4 v = \left| \begin{array}{c} 0.39\\ 0.11 \\ 0.29 \\ 0.19 \end{array}\right |

A^5 v = \left| \begin{array}{c} 0.39\\ 0.13\\ 0.28 \\ 0.19 \end{array}\right |

A^6 v = \left| \begin{array}{c} 0.38\\ 0.13 \\ 0.29 \\ 0.19 \end{array}\right |

A^7 v = \left| \begin{array}{c} 0.38\\ 0.12 \\ 0.29 \\ 0.19 \end{array}\right |

A^8 v = \left| \begin{array}{c} 0.38\\ 0.12 \\ 0.29 \\ 0.19 \end{array}\right |

Note the pattern of v, Av, ..., A^kv trends to the equilibrium value of v_{equilibrium} = \left| \begin{array}{c} 0.38\\ 0.12 \\ 0.29 \\ 0.19 \end{array}\right |

The equilibrium value is the PageRank vector of our web graph.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s