> Hi all,
> 
> has someone an effective way to block away bots and spiders?
> There are so many ignoring robots.txt - besides facebook and tons of
> Java/xyz clients there are many more illegal spiders around.
> 
> So does anyone have a performance effective way to block them out?
> Or do you think - if performance matters, leave them crawling...
> 
> regards
> 
> Henrik

The only thing I can think of is checking the request.META dictionary. It 
contains the HTTP_HOST and HTTP_USER_AGENT settings by the client. You could 
check if those are valid but of course a spider could fake those values. But 
I'm guessing not all spiders fake them.
You could at least filter out those that don't set these values.

It would probably require lot's of heavy regular expression code (since there 
are so many valid client headers) which would be best implemented as a 
decorator on each view method.  Or you could probably put the code in a 
middleware.
Eventually taking out those spiders would slow down your request.

Again I have no experience on this field, it's just an idea that might be 
possible.

Regards,

Jonas.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to