Have you ever wondered how web pages are ranked in searches? Modern search engines employ methods of ranking the results to provide the “best” results first that are more elaborate than just plain text ranking, and each search engine may use a different algorithm. One of the most known and influential algorithms for computing the relevance of web pages is the Page Rank algorithm used by the Google search engine. It was invented by Larry Page and Sergey Brin while they were graduate students at Stanford, and it became a Google trademark in 1998. Page Rank ranks its pages using an algorithm primarily based on the probability of a web surfer randomly landing on a page.
Imagine four different web pages: page1.html, page2.html, page3.html, and page4.html. Each page will have an importance score based on how well it links to other pages using in-degree and Page Rank calculations. Generally speaking, each page with k outgoing links will transfer of its importance to each of the nodes the page is linked to.
The four webpages shown in the webgraph above will be labeled as such:
- A – page1.html. A has 3 outbound links to B, C, and D. This is node 1.
- B – page2.html. B has 2 outbound links to C and D. This is node 2.
- C – page3.html. C has 1 outbound link to A . This is node 3.
- D – page4.html. D has 2 outbound links to A and C. This is node 4.
The outbound links transfer their importance from each node. In this example, node 1, or A, or page1.html, will pass on of its importance to its three linked nodes B, C, and D. Node 2, or B, or page2.html will pass of its importance to its two linked nodes, C and D. Node 3, or C, or page3.html will only pass a unit of 1 to its sole link to A. Node 4, or D, or page4.html, will pass of its importance to A and C.
From a linear algebra point of view, this would look like the following:
The Eigenvectors are of the form
We choose to be the unique Eigenvector with the sum of all entries equal to 1, which will lead to our PageRank vector:
This yields a matrix that looks like the following:
Suppose the initial importance of each node is equally distributed amongst the 4 nodes, with each getting . From there we update the rank according to the current importance value that we just calculated for .
Note the pattern of trends to the equilibrium value of
The equilibrium value is the PageRank vector of our web graph.