On Tue, Jul 25, 2006 at 02:45:28PM -0700, prad wrote:
> what is the best way to stop those robots and spiders from getting in?
> 
> .htaccess?
> robot.txt and apache directives?
> find them on the access_log and block with pf?
> 
> i should also ask whether it is a good idea to block robots in the first 
> place 
> since some do help to increase presence on the web.
> which are good robots and which are bad?

Almost all "real" robots will obey robots.txt, and that should be your
first attempt.

The ones that do not obey robots.txt will probably not obey anything
else, either. If you block them with pf, try "block return" instead of
"block drop" and maybe they'll give up quicker.

As for whether you should block them or not, that's up to you. I am
currently blocking Yahoo Slurp in robots.txt (yes, it works) because
Yahoo has always been irrelevant to my traffic, their robot is
incredibly obnoxious, and every one of their referrals leaves off the
trailing slash for directories. Other than that I let them all come and
for the most part they behave well.

And... this is *REALLY* not an OpenBSD topic, and there's a *LOT* that's
been written about this topic in other places.

-- 
Darrin Chandler            |  Phoenix BSD Users Group
[EMAIL PROTECTED]   |  http://bsd.phoenix.az.us/
http://www.stilyagin.com/  |

Reply via email to