bryan rasmussen wrote: > Hi, > > Sorry, was imprecise, I meant not save the downloaded page locally. > There probably isn't one though, so I should build one myself. > Probably just need a good crawler that can be set to dump all links > into dataset that I can analyse with R. > > Cheers, > Bryan Rasmussen >
There's quite a few already: webchecker, Orchid, mechanize, mygale: http://codesnipers.com/?q=node/223&&title=Detecting-Dead-Links http://pxr.openlook.org/pxr/source/Tools/webchecker/ http://sig.levillage.org/?p=599 http://www.robertblum.com/articles/2005/11/21/challenge-map-i-python-web-scraping http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html -- http://mail.python.org/mailman/listinfo/python-list