On Tue, Feb 18, 2003 at 01:09:34PM -0600, Drew Scott Daniels wrote: > > > A robots.txt might tell e-mail harvesters to exclude the script that > > > generates e-mail addresses. > > > > IMHO, a robot that is doing something as antisocial as trawling for > > e-mail addresses is not likely to be heeding a robots.txt file. > > I don't think it'll eliminate harvesting, but it might help reduce it. A > robots.txt on the archives would be nearly worthless, but one on a script > would be more likely to work. E-mail harvesting on scripts can be risky. > If you read a history on robots.txt you'll see many examples that show if > a harvester or crawler does not heed the robots.txt they can reek havoc > on some scripts and get caught in endless loops. I don't see this kind of > thing happening now, so it leads me to believe that even antisocial > "trawlers" don't tend to run scripts listed in robots.txt.
Well, actually, I've seen viewcvs/cvsweb get crawled on several occasions, and getting obscenely long URLs after a while. They also gave up after a while. The whole spam business can hardly be stopped by non-forceful restrictions. Heck, I'd bet that a better part of the email addresses in those lists they sell are randomly generated, since I watch [EMAIL PROTECTED] get double-bounced on my mail hosts all the time. -- 2. That which causes joy or happiness.