On 06/03/2018 01:47 PM, Kornel Benko wrote: > > Am Sonntag, 3. Juni 2018 19:29:08 CEST schrieb Jean-Marc Lasgouttes > <lasgout...@lyx.org>: > > > Le 03/06/2018 à 18:57, Richard Kimberly Heck a écrit : > > > > The server has been all but dead today. I found that the trac database > > > > was being scanned from crawl.sogou.com, which was apparently ignoring > > > > our robots.txt file. I've added > > > > > > > > Order allow,deny > > > > Allow from all > > > > Deny from crawl.sogou.com > > > > > > > > to the httpd configuration for trac, and that seems to have solved the > > > > problem. > > > > > > > > If there are similar issues later, we can do similar things. > > > > > > Excellent idea. I did restart httpd earlier, hoping it would do > > > something (which it did not), but I see you are resorting to heavy > > > handed techniques intead :) > > > > > > JMarc > > > > > > > According to this page > > https://www.keycdn.com/blog/web-crawlers/ > > this looks like a very bad Chinese crawler. > > Citing: > > > Sogou Spider is the web crawler for Sogou.com, a leading Chinese search > > > engine that was launched in 2004. As of April 2016 it has a rank of > 103 in > > > Alexa’s internet rankings. Note: The Sogou web spider does not > respect the > > > robots.txt internet standard, and is therefore banned from many websites > > > because of excessive crawling. >
One more from which it's banned, then! Riki