Re: stopping robots

Rogier Krieger Tue, 25 Jul 2006 15:14:00 -0700

On 7/25/06, prad <[EMAIL PROTECTED]> wrote:

what is the best way to stop those robots and spiders from getting in?


The sure way to stop robots and spiders is to shut down your web
server. I don't suppose that's the answer you're looking for.

Treat malicious robots as malicious/unwelcome users. For whatever your
definition of malicious, do not expect to be able to easily discern
between regular human users and robots. It's too easy to alter
user-agent strings, etc to rely on those without precautions (as with
all client-generated input).

.htaccess?


That might help, but not solve your problem discerning between human
and automated clients. Also, the usual problems/threats regarding
credentials will of course apply. Mind you, automated processes
(robots) can also use credentials.

Possibly you can also use CAPTCHA. Various modules (PHP, Perl) exist
that allow to integrate these easily. Whether (or when) robots will be
able to fool these tests is another matter.

robot.txt and apache directives?


Well-behaved robots will adhere to measures such as (x)html meta tags,
robots.txt files, etc. Other robots may not.

find them on the access_log and block with pf?


Using access_log means you're using information gathered from after the fact.

which are good robots and which are bad?


Apart from robots/spiders potentially being an excellent friend,
allowing robots (e.g. Google) may also have undesirable side effects.
Such effects range from out-dated information being displayed to
search engine users to sensitive data being stored on servers outside
your influence. I'm sure there are many more.

I'd recommend you think about your threat model first and use that to
determine which information you deem sensitive and to what lengths you
will go to secure that information.

Cheers,

Rogier

--
If you don't know where you're going, any road will get you there.

Re: stopping robots

Reply via email to