Hi Collin, Collin Funk wrote: > I remember recently cgit was disabled due to abuse.
We have never disabled cgit entirely. But we have implemented various anti-abuse measures around cgit. We did disable the Subversion ViewVC CGI for a while due to abuse between January 18 and January 20. That was for two days during the worst of an AI scraper bot wave. > I don't know the exact details, but I saw that FreeBSD's cgit was > having similar issues with being taken down by bots crawling. The AI scraper bots have been pummeling the net. If only they would clone the repositories they would have everything. Rather than trying to crawl every revision of every branch of every repository rendering them for human viewing! > I figured I would share an article about how they worked around it > in case it is helpful [1]. > [1] https://blog.sysopscafe.com/posts/ai-crawlers-hammering-git-repos/ Nice! Thanks for sharing that article. I hadn't seen it yet. It's an interesting idea to implement a separate rate limit specifically for URLs that contain an id in the query string. I will investigate that tactic and see how it works for us. The current main abuse problem though is not an AI web scraper. Though those are still out in force. It's never just one abuse agent because there are always many operating concurrently. The main abuse agent now is a HUGE botnet 5M+ strong which is hitting the CGIT cgit.cgi interface with nonsense URLs. It's just flinging noise at it really. It's definitely not usefully scraping. I think someone made a programming mistake in a Marai botnet and released it simply to create havoc. Fortunately due to the very nature as being nonsense it has a very positively identifying signature fingerprint to it. Bob