Hello, I just saw this article that seems to be pertinent to this discussion also.
https://techcrunch.com/2025/03/27/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance/?guccounter=1 The article featured a new bot proxy that was released on March 19th that might be worth evaluating. Anubis is a reverse proxy proof-of-work check > <https://xeiaso.net/blog/2025/anubis/>that must be passed before requests > are allowed to hit a Git server. It blocks bots but lets through browsers > operated by humans. Looks like it is a Docker Image that would run between your load balancer and Evergreen servers. https://anubis.techaro.lol/docs/admin/installation Josh On Tue, Mar 25, 2025 at 1:08 PM JonGeorg SageLibrary via Evergreen-general < [email protected]> wrote: > Thank you to everyone who responded. We're working with our vendor to see > what can be done. I appreciate the responses. > -Jon > > On Wed, Mar 19, 2025 at 10:55 AM Kev Woolley < > [email protected]> wrote: > >> Hi Jon, >> >> We use CrowdSec: https://www.crowdsec.net/ >> >> It allows you to define your own scenarios that allow you to make >> decisions on incoming traffic and automatically mitigate it via firewall or >> other banning measures, throwing up a CAPTCHA, and more. >> >> Note that CrowdSec doesn't work in any time slice above 48 hours -- all >> of its mitigations are very short-lived. We are combining this with a >> substantial long-term blocklist (implemented as an ipset block in Linux >> iptables) that subsumes the functionality of both geo and provider blocks >> for longer-term mitigations. This is, of course, a labour-heavy endeavour, >> but we've tried several alternatives, and this is what's working best so >> far. >> >> We have scenarios defined to catch useragent traits and block useragents >> that seem bad. After some initial learning ("oh, so this version of MS >> Office says it's MSIE 7.0, so a library just blocked themselves -- oops!" >> and similar situations), it was pretty easy to get most bot traffic caught >> in that. As I get time (and more familiarity with writing the scenarios) >> I'll be designing scenarios that look for specific behaviours (such as >> grabbing the links on a page in order, too quickly) and improving our >> defense that way. >> >> CS offers reasonably good visualisation and reporting tools. This is >> useful for both keeping track of who's doing what, but also seeing the >> persistent threats and creating entries in the long-term blocklist for >> those. >> >> My observation, even very recently as I've been working on the long-term >> blocklist and not updating it on our servers (working with ~10k rules takes >> a while), is that there really doesn't seem to be a point where one can >> take their eyes off the issue entirely and forget about it -- new traffic >> comes out of the woodwork. With a substantial enough long-term blocklist >> this can reduce the time spent to a reasonable amount, but there doesn't >> seem to be an "okay, we're done here" point. >> >> My gut feel is that 30-50k long-term blocklist rules is where we may end >> up eventually (with some years of building them). >> >> I'm happy to share what I've got in the LTB. It's been built over the >> last several months, based on the attacks we've received. >> >> Resources I've found helpful include: >> >> https://www.qurium.org/ -- their digital forensics and investigations >> pages have a lot of good info on the methods and actors for some types of >> attacks -- we experienced this flavour, in particular: >> >> >> https://www.qurium.org/weaponizing-proxy-and-vpn-providers/fineproxy-rayobyte/ >> >> Finding this site helped confirm a lot of information I'd found over the >> previous couple of years, studying these things on my own. >> >> https://www.radb.net/ -- you can query this for free, and it's a good >> way to look up network information without having to bounce around between >> ARIN, RIPE, APNIC, and other RIRs (Regional Internet Registries). You can >> do advanced queries against it with a Whois client, as well: >> >> whois -h whois.radb.net -- '-i origin AS714' >> >> The above command will give a list of everything originating from one of >> Apple's ASNs (Autonomous System Numbers; these are used to help manage >> routing). For example: >> >> whois -h whois.radb.net -- '-i origin AS55185' >> >> Gives: >> >> route: 209.87.62.0/24 >> origin: AS55185 >> descr: 750 - 555 Seymour Street >> Vancouver BC V6B-3H6 >> Canada >> admin-c: HOSTM458-ARIN >> tech-c: NOC33711-ARIN >> mnt-by: MNT-BC-Z >> created: 2023-12-07T21:58:41Z >> last-modified: 2023-12-07T21:58:41Z >> source: ARIN >> rpki-ov-state: valid >> >> route6: 2607:f8f0:6a0::/48 >> origin: AS55185 >> descr: 750 - 555 Seymour Street >> Vancouver BC V6B-3H6 >> Canada >> admin-c: HOSTM458-ARIN >> tech-c: NOC33711-ARIN >> mnt-by: MNT-BC >> created: 2023-12-07T22:00:06Z >> last-modified: 2023-12-07T22:00:06Z >> source: ARIN >> rpki-ov-state: valid >> >> With a bit of scripting, it's not difficult to pull out the route: and >> route6: lines, run them through aggregate (a tool that removes duplication >> and shadowing of lists of netblocks, giving you the shortest possible list >> of netblocks that cover all of the provided addresses), and output them to >> a file for validation and addition to whatever solution you're using. >> >> It's a huge topic, and I've already babbled long enough. I'm happy to >> give info or lend a hand, though. It's a hard problem. >> >> Thank you, >> >> Kev >> >> >> -- >> Kev Woolley (they/them) >> >> Gratefully acknowledging that I live and work in the unceded traditional >> territories of the Səl̓ílwətaɬ (Tsleil-Waututh) and Sḵwx̱wú7mesh Úxwumixw. >> >> >> >> ________________________________________ >> From: JonGeorg SageLibrary via Evergreen-general < >> [email protected]> >> Sent: 19 March 2025 08:52 >> To: Evergreen Discussion Group >> Cc: JonGeorg SageLibrary >> Subject: [Evergreen-general] Bot issues >> >> We've been dealing with a lot of bots crawling our catalog, and >> overwhelming our app servers. >> >> Are any of you having the same issue, and if so what tools are you using >> to remedy the situation? >> >> We've already implemented geoblocking to limit traffic to the US and >> Canada, after being overwhelmed by queries from overseas. >> >> I've been looking at bad bot blocker as an option. >> -Jon >> This message originated from outside the M365 organisation. Please be >> careful with links, and don't trust messages you don't recognise. >> > _______________________________________________ > Evergreen-general mailing list -- [email protected] > To unsubscribe send an email to > [email protected] >
_______________________________________________ Evergreen-general mailing list -- [email protected] To unsubscribe send an email to [email protected]
