name: Phil Mitchell
email: [EMAIL PROTECTED]
userid: PHILMI
description: link checking script
discussed: I mentioned this script on libwww and there seemed to be 
significant interest
in it. Although there are many link check scripts floating around, 
including one in
CPAN/web, they generally conflate spidering and link checking. My script 
just checks a
list of urls provided in a file. This is a more general solution, and lets 
me concentrate
on the _checking_ part. The script was written to check the 10,000-plus 
urls in the
Harvard library catalog, and I have worked pretty hard to weed out all 
sorts of spurious
error reports. EG., Configurable params to specify how many rechecks and 
how they are
spread out in time allow one to control the degree of conservatism in 
handling servers
that are just temporarily unavailable. My script also handles the Solaris 
quirk which
causes LWP (and telnet) to time out on responses from certain web servers 
despite
receiving the response. (A post to libwww about this has drawn considerable 
interest, as
well.) I will admit, my perl is not particularly sophisticated -- but it is 
clean and it works and is
well-documented, and there are way too many people out there writing their 
own link check
scripts!

regards,

phil

Reply via email to