Why don't you just update your robots.txt to explicitly specify which files you don't or do, allow spiders access to. If it's a rule-obiding spider, that will be the end of it.
On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote: > I have a nasty web spider with an agent name of "LinkWalker" downloading > everything on my site (including .tgz files). Does anyone know anything > about it? > > I've added the following to my firewall setup to stop further attacks... > > # crappy LinkWalker - evil spider that downloads every file including .tgz on > # the site > iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www > > -- > http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark > http://www.coker.com.au/postal/ Postal SMTP/POP benchmark > http://www.coker.com.au/projects.html Projects I am working on > http://www.coker.com.au/~russell/ My home page > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] > > -- Nick Jennings -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]