On 11/01/2016 03:25 PM, Jean-Marc Lasgouttes wrote:
> Le 01/11/2016 à 03:52, Joel Kulesza a écrit :
>> Does anyone know what the instability is stemming from (hardware,
>> software, provider, etc.)?
>
> I would say that the problem was receiving the visit of
> crawl.sogou.com (220.181.125.68), which sucked 700MiB of data (along
> with some Amazon aws guy who claimed 320MiB). These are badly behaved
> bots which do not respect our robots.txt file. In this file, we
> request explicitly to skip indexing the /trac subdirectory.
>
> I can blacklist by hand these bots, but they are gone now AFAIK and
> the harm is done. It seems possible to download automatically some
> blacklist of bad crawlers and blacklist them, but I do not know
> whether this is a good idea in real life.

What about rate limiting access to the trac/ subdirectory? The problem
is that we get hit too many times too fast by these bots. Refusing to
serve them that fast won't make them go away, but it will keep them from
taking us down.

Richard

Reply via email to