On 06/03/2018 01:47 PM, Kornel Benko wrote:
>
> Am Sonntag, 3. Juni 2018 19:29:08 CEST schrieb Jean-Marc Lasgouttes
> <lasgout...@lyx.org>:
>
> > Le 03/06/2018 à 18:57, Richard Kimberly Heck a écrit :
>
> > > The server has been all but dead today. I found that the trac database
>
> > > was being scanned from crawl.sogou.com, which was apparently ignoring
>
> > > our robots.txt file. I've added
>
> > >
>
> > >   Order allow,deny
>
> > >   Allow from all
>
> > >   Deny from crawl.sogou.com
>
> > >
>
> > > to the httpd configuration for trac, and that seems to have solved the
>
> > > problem.
>
> > >
>
> > > If there are similar issues later, we can do similar things.
>
> >
>
> > Excellent idea. I did restart httpd earlier, hoping it would do
>
> > something (which it did not), but I see you are resorting to heavy
>
> > handed techniques intead :)
>
> >
>
> > JMarc
>
> >
>
>  
>
> According to this page
>
> https://www.keycdn.com/blog/web-crawlers/
>
> this looks like a very bad Chinese crawler.
>
> Citing:
>
> > Sogou Spider is the web crawler for Sogou.com, a leading Chinese search
>
> > engine that was launched in 2004. As of April 2016 it has a rank of
> 103 in
>
> > Alexa’s internet rankings. Note: The Sogou web spider does not
> respect the
>
> > robots.txt internet standard, and is therefore banned from many websites
>
> > because of excessive crawling.
>

One more from which it's banned, then!

Riki

Reply via email to