Am Sonntag, 3. Juni 2018 19:29:08 CEST schrieb Jean-Marc Lasgouttes 
<lasgout...@lyx.org>:
> Le 03/06/2018 à 18:57, Richard Kimberly Heck a écrit :
> > The server has been all but dead today. I found that the trac database
> > was being scanned from crawl.sogou.com, which was apparently ignoring
> > our robots.txt file. I've added
> > 
> >    Order allow,deny
> >    Allow from all
> >    Deny from crawl.sogou.com
> > 
> > to the httpd configuration for trac, and that seems to have solved the
> > problem.
> > 
> > If there are similar issues later, we can do similar things.
> 
> Excellent idea. I did restart httpd earlier, hoping it would do 
> something (which it did not), but I see you are resorting to heavy 
> handed techniques intead :)
> 
> JMarc
> 

According to this page
        https://www.keycdn.com/blog/web-crawlers/
this looks like a very bad Chinese crawler.
Citing:
>       Sogou Spider is the web crawler for Sogou.com, a leading Chinese search
>       engine that was launched in 2004. As of April 2016 it has a rank of 103 
> in
>       Alexa’s internet rankings. Note: The Sogou web spider does not respect 
> the
>       robots.txt internet standard, and is therefore banned from many websites
>       because of excessive crawling.

        Kornel

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to