Re: [users@httpd] Blocking crawling of CGIs

Mark A. Craig Tue, 18 Sep 2007 11:43:15 -0700

There's no guarantee that crawlers will be polite and honor robots.txtdirectives; the search-engine ones probably do, but the spammers' onesdefinitely don't and in fact probably pay special attention to what's excluded.(I have a honeypot entry in my robots.txt designed to catch and then blockthe malicious robots.) OTOH, since the user-agent data is also only as reliableas the intent of whoever sets the crawler up, filtering based on that may not bemuch help either. I seem to recall having read somewhere that it's possible toconfigure Apache to recognize "executables" independent of the OS and fileextensions and associations? If that's true, perhaps that might lead to somesolution to your problem.


Mark


-------- Original Message  --------
Subject: [EMAIL PROTECTED] Blocking crawling of CGIs
From: Tony Rice (trice) <[EMAIL PROTECTED]>
To: users@httpd.apache.org
Date: Tuesday, September 18, 2007 11:24:20 AM

We've had some instances where crawlers have stumbled onto a cgi script
which refers to itself and start pounding the server with requests to
that cgi.

There are so many CGI scripts on this server that I don't want to
maintain a huge robots.txt file.  Any suggestions on other techniques to
keep crawlers away from cgi scripts?  Check the browser with
BrowserMatch and then do something creative with "deny from env="?



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
  "   from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users@httpd] Blocking crawling of CGIs

Reply via email to