Hi Collin,

Collin Funk wrote:
> I remember recently cgit was disabled due to abuse.

We have never disabled cgit entirely.  But we have implemented various
anti-abuse measures around cgit.

We did disable the Subversion ViewVC CGI for a while due to abuse
between January 18 and January 20.  That was for two days during the
worst of an AI scraper bot wave.

> I don't know the exact details, but I saw that FreeBSD's cgit was
> having similar issues with being taken down by bots crawling.

The AI scraper bots have been pummeling the net.  If only they would
clone the repositories they would have everything.  Rather than trying
to crawl every revision of every branch of every repository rendering
them for human viewing!

> I figured I would share an article about how they worked around it
> in case it is helpful [1].
> [1] https://blog.sysopscafe.com/posts/ai-crawlers-hammering-git-repos/

Nice!  Thanks for sharing that article.  I hadn't seen it yet.

It's an interesting idea to implement a separate rate limit
specifically for URLs that contain an id in the query string.  I will
investigate that tactic and see how it works for us.

The current main abuse problem though is not an AI web scraper.
Though those are still out in force.  It's never just one abuse agent
because there are always many operating concurrently.  The main abuse
agent now is a HUGE botnet 5M+ strong which is hitting the CGIT
cgit.cgi interface with nonsense URLs.  It's just flinging noise at it
really.  It's definitely not usefully scraping.  I think someone made
a programming mistake in a Marai botnet and released it simply to
create havoc.  Fortunately due to the very nature as being nonsense it
has a very positively identifying signature fingerprint to it.

Bob

Reply via email to