On Tue, Jul 25, 2006 at 02:45:28PM -0700, prad wrote: > what is the best way to stop those robots and spiders from getting in? > > .htaccess? > robot.txt and apache directives? > find them on the access_log and block with pf? > > i should also ask whether it is a good idea to block robots in the first > place > since some do help to increase presence on the web. > which are good robots and which are bad?
Almost all "real" robots will obey robots.txt, and that should be your first attempt. The ones that do not obey robots.txt will probably not obey anything else, either. If you block them with pf, try "block return" instead of "block drop" and maybe they'll give up quicker. As for whether you should block them or not, that's up to you. I am currently blocking Yahoo Slurp in robots.txt (yes, it works) because Yahoo has always been irrelevant to my traffic, their robot is incredibly obnoxious, and every one of their referrals leaves off the trailing slash for directories. Other than that I let them all come and for the most part they behave well. And... this is *REALLY* not an OpenBSD topic, and there's a *LOT* that's been written about this topic in other places. -- Darrin Chandler | Phoenix BSD Users Group [EMAIL PROTECTED] | http://bsd.phoenix.az.us/ http://www.stilyagin.com/ |