# VSnake notes

Записки программиста, обо всем и ни о чем. Но, наверное, больше профессионального.

## 2008-02-01

### кому слава гугля спать не дает?

Нравится мне питон (язык программирования). И вот на глаза попалось:

.
The Google PageRank Algorithm in 126 Lines of Python
Reading How Google Finds Your Needle in the Web's Haystack I was surprised by the simplicity of the math underlying the google PageRank algorithm, and the ease with which it seemed to be efficiently implementable. Being able to do a google-style ranking seems useful for a wide range of cases, and since I had wanted to take a look at python for numerics for some time, I decided to give it a shot (note that there already exists a python implementation using the numarray module).

© 2006, Vincent Kräutler.

.

## The Code

All of the following is done in python, with the numpy library, so don't forget to

`from numpy import *`

First, we need to transform the matrix of outgoing links into one of incoming links, while preserving the number of outgoing links per page and identifying leaf nodes:

`def transposeLinkMatrix(       outGoingLinks = [[]]       ):       """       Transpose the link matrix. The link matrix contains the pages each page points to.       However, what we want is to know which pages point to a given page. However, we       will still want to know how many links each page contains (so store that in a separate array),       as well as which pages contain no links at all (leaf nodes).       @param outGoingLinks outGoingLinks[ii] contains the indices of pages pointed to by page ii       @return a tuple of (incomingLinks, numOutGoingLinks, leafNodes)       """       nPages = len(outGoingLinks)       # incomingLinks[ii] will contain the indices jj of the pages linking to page ii       incomingLinks = [[] for ii in range(nPages)]       # the number of links in each page       numLinks = zeros(nPages, int32)       # the indices of the leaf nodes       leafNodes = []       for ii in range(nPages):               if len(outGoingLinks[ii]) == 0:                       leafNodes.append(ii)               else:                         numLinks[ii] = len(outGoingLinks[ii])                       # transpose the link matrix                       for jj in outGoingLinks[ii]:                               incomingLinks[jj].append(ii)                             incomingLinks = [array(ii) for ii in incomingLinks]       numLinks = array(numLinks)       leafNodes = array(leafNodes)                                     return incomingLinks, numLinks, leafNodes`
Next, we need to perform the iterative refinement. Wrapping it up in a generator to hide the state of the iteration behind a function-like interface. Single-precision should be good enough.
`def pageRankGenerator(       At = [array((), int32)],       numLinks = array((), int32),       ln = array((), int32),       alpha = 0.85,       convergence = 0.01,       checkSteps = 10       ):       """       Compute an approximate page rank vector of N pages to within some convergence factor.       @param At a sparse square matrix with N rows. At[ii] contains the indices of pages jj linking to ii.       @param numLinks iNumLinks[ii] is the number of links going out from ii.       @param ln contains the indices of pages without links @param alpha a value between 0 and 1. Determines the relative importance of "stochastic" links. @param convergence a relative convergence criterion. smaller means better, but more expensive.       @param checkSteps check for convergence after so many steps       """       # the number of "pages"       N = len(At)       # the number of "pages without links"       M = ln.shape       # initialize: single-precision should be good enough       iNew = ones((N,), float32) / N       iOld = ones((N,), float32) / N       done = False       while not done:               # normalize every now and then for numerical stability               iNew /= sum(iNew)               for step in range(checkSteps):                       # swap arrays                       iOld, iNew = iNew, iOld                       # an element in the 1 x I vector.                       # all elements are identical.                       oneIv = (1 - alpha) * sum(iOld) / N                       # an element of the A x I vector.                       # all elements are identical.                       oneAv = 0.0                       if M > 0:                               oneAv = alpha * sum(iOld.take(ln, axis = 0)) * M / N                       # the elements of the H x I multiplication                       ii = 0                       while ii < N:                               page = At[ii]                               h = 0                               if page.shape:                                       h = alpha * dot(                                               iOld.take(page, axis = 0),                                               1. / numLinks.take(page, axis = 0)                                               )                               iNew[ii] = h + oneAv + oneIv                               ii += 1               diff = iNew - iOld               done = (sqrt(dot(diff, diff)) / N < convergence)               yield iNew`
Finally, we might want to wrap up the above behind a convenience interface like so:
`def pageRank(       linkMatrix = [[]],       alpha = 0.85,       convergence = 0.01,       checkSteps = 10       ):       """       Convenience wrap for the link matrix transpose and the generator.       """       incomingLinks, numLinks, leafNodes = transposeLinkMatrix(linkMatrix)       for gr in pageRankGenerator(incomingLinks, numLinks, leafNodes,   alpha = alpha,   convergence = convergence,   checkSteps = checkSteps):               final = gr       return final`

.
.

## Ярлыки

linux (241) python (191) citation (186) web-develop (170) gov.ru (159) video (124) бытовуха (114) sysadm (100) GIS (97) Zope(Plone) (88) бурчалки (84) Book (82) programming (82) грабли (77) development (73) Fun (72) windsurfing (72) Microsoft (64) hiload (62) opensource (58) security (57) опыт (55) movie (52) Wisdom (51) ML (47) language (45) hardware (44) money (42) JS (41) driving (41) curse (40) bigdata (39) DBMS (38) ArcGIS (34) history (31) PDA (30) howto (30) holyday (29) Google (27) Oracle (27) tourism (27) virtbox (27) health (26) vacation (24) AI (23) Autodesk (23) SQL (23) Java (22) humor (22) knowledge (22) translate (20) CSS (19) cheatsheet (19) hack (19) Apache (16) Manager (15) web-browser (15) Никонов (15) happiness (14) music (14) todo (14) PHP (13) course (13) scala (13) weapon (13) HTTP. Apache (12) SSH (12) frameworks (12) hero (12) im (12) settings (12) HTML (11) SciTE (11) USA (11) crypto (11) game (11) map (11) HTTPD (9) ODF (9) купи/продай (9) Photo (8) benchmark (8) 3D (7) CS (7) DNS (7) NoSQL (7) cloud (7) django (7) gun (7) matroska (7) telephony (7) VCS (6) bluetooth (6) pidgin (6) proxy (6) Donald Knuth (5) ETL (5) NVIDIA (5) REST (5) bash (5) flash (5) keyboard (5) price (5) samba (5) CGI (4) LISP (4) RoR (4) cache (4) display (4) holywar (4) nginx (4) pistol (4) spark (4) xml (4) Лебедев (4) IDE (3) IE8 (3) J2EE (3) NTFS (3) RDP (3) holiday (3) mount (3) Гоблин (3) кухня (3) урюк (3) AMQP (2) ERP (2) IE7 (2) NAS (2) Naudoc (2) PDF (2) address (2) air (2) british (2) coffee (2) font (2) ftp (2) messaging (2) notify (2) sharepoint (2) ssl/tls (2) stardict (2) tests (2) tunnel (2) udev (2) APT (1) CRUD (1) Canyonlands (1) Cyprus (1) DVDShrink (1) Jabber (1) K9Copy (1) Matlab (1) Palanga (1) Portugal (1) VBA (1) WD My Book (1) autoit (1) bike (1) cannabis (1) chat (1) concurrent (1) dbf (1) ext4 (1) idioten (1) join (1) krusader (1) license (1) mindmap (1) navitel (1) quiz (1) regexp (1) robot (1) science (1) spatial (1) tie (1) vim (1) Науру (1) крысы (1) налоги (1) пианино (1)