tags 431625 + pending thanks On Fri, 2007-07-06 at 16:22 +0200, Arthur de Jong wrote: > I can reproduce the problem with reading webcheck.dat but for now not > the crash while crawling (test is ongoing).
I have not found anything while crawling www.nl.debian.org but the crawl is ongoing. I have webcheck.dat file here: http://arthurenhella.demon.nl/webcheck/www.nl.debian.org/webcheck.dat.bz2 but crawling is not very fast. > Your report however points out another problem: reading webcheck.dat > shouldn't take 30 minutes. I've done some quick tests but I haven't > been able to pinpoint this one yet. There seems to be something going > wrong with the buffering. It turned out that the datatypes that were used were not very efficient. I've switched to using sets instead of lists which speeds reading the webcheck.dat file up a factor 25 (at least for the huge one of debian.org). -- -- arthur - [EMAIL PROTECTED] - http://people.debian.org/~adejong --
signature.asc
Description: This is a digitally signed message part