On 8 Jan 2002, at 9:56, Jesse Goerz wrote: > On Tuesday 08 January 2002 01:38, Russell Coker wrote: > > On Mon, 7 Jan 2002 23:31, Nathan Strom wrote: > > > > I have a nasty web spider with an agent name of > > > > "LinkWalker" downloading everything on my site > > > > (including .tgz files). Does anyone know anything > > > > about it? > > > > > > It's apparantly a link-validation robot operated by a > > > company called SevenTwentyFour Incorporated, see: > > > http://www.seventwentyfour.com/tech.html > > > > Oops. > > > > Actually they sent me an offer of a free trial to their > > service (which seems quite useful). The free trial gave > > me some useful stats and let me fix a bunch of broken > > links (of course I didn't pay). > > You can do the same thing with wget: > --spider > When invoked with this option, Wget will behave as a Web > spider, which means that it will not download the pages, > just check that they are there. You can use it to check > your bookmarks, e.g. with: > > wget --spider --force-html -i bookmarks.html > > This feature needs much more work for Wget to get close > to the functionality of real WWW spiders. > > You'll be checking more than bookmarks but you get the idea. >
In case you are running ht://dig, there's a add-on on the contributed works page to parse htdig's output and generate a broken links report from it. Since htdig touches every link anyway, quite intimating. Cheers, Marcel -- __ .´ `. : :' ! Enjoy `. `´ Debian/GNU Linux `- Now even on the 5 Euro banknote! -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]