[Evergreen-general] Re: Bot issues

Josh Stompro via Evergreen-general Fri, 28 Mar 2025 07:19:14 -0700

Hello, I just saw this article that seems to be pertinent to this
discussion also.


https://techcrunch.com/2025/03/27/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance/?guccounter=1

The article featured a new bot proxy that was released on March 19th that
might be worth evaluating.

Anubis is a reverse proxy proof-of-work check
> <https://xeiaso.net/blog/2025/anubis/>that must be passed before requests
> are allowed to hit a Git server. It blocks bots but lets through browsers
> operated by humans.


Looks like it is a Docker Image that would run between your load balancer
and Evergreen servers.
https://anubis.techaro.lol/docs/admin/installation

Josh

On Tue, Mar 25, 2025 at 1:08 PM JonGeorg SageLibrary via Evergreen-general <
[email protected]> wrote:

> Thank you to everyone who responded. We're working with our vendor to see
> what can be done. I appreciate the responses.
> -Jon
>
> On Wed, Mar 19, 2025 at 10:55 AM Kev Woolley <
> [email protected]> wrote:
>
>> Hi Jon,
>>
>> We use CrowdSec: https://www.crowdsec.net/
>>
>> It allows you to define your own scenarios that allow you to make
>> decisions on incoming traffic and automatically mitigate it via firewall or
>> other banning measures, throwing up a CAPTCHA, and more.
>>
>> Note that CrowdSec doesn't work in any time slice above 48 hours -- all
>> of its mitigations are very short-lived. We are combining this with a
>> substantial long-term blocklist (implemented as an ipset block in Linux
>> iptables) that subsumes the functionality of both geo and provider blocks
>> for longer-term mitigations. This is, of course, a labour-heavy endeavour,
>> but we've tried several alternatives, and this is what's working best so
>> far.
>>
>> We have scenarios defined to catch useragent traits and block useragents
>> that seem bad. After some initial learning ("oh, so this version of MS
>> Office says it's MSIE 7.0, so a library just blocked themselves -- oops!"
>> and similar situations), it was pretty easy to get most bot traffic caught
>> in that. As I get time (and more familiarity with writing the scenarios)
>> I'll be designing scenarios that look for specific behaviours (such as
>> grabbing the links on a page in order, too quickly) and improving our
>> defense that way.
>>
>> CS offers reasonably good visualisation and reporting tools. This is
>> useful for both keeping track of who's doing what, but also seeing the
>> persistent threats and creating entries in the long-term blocklist for
>> those.
>>
>> My observation, even very recently as I've been working on the long-term
>> blocklist and not updating it on our servers (working with ~10k rules takes
>> a while), is that there really doesn't seem to be a point where one can
>> take their eyes off the issue entirely and forget about it -- new traffic
>> comes out of the woodwork. With a substantial enough long-term blocklist
>> this can reduce the time spent to a reasonable amount, but there doesn't
>> seem to be an "okay, we're done here" point.
>>
>> My gut feel is that 30-50k long-term blocklist rules is where we may end
>> up eventually (with some years of building them).
>>
>> I'm happy to share what I've got in the LTB. It's been built over the
>> last several months, based on the attacks we've received.
>>
>> Resources I've found helpful include:
>>
>> https://www.qurium.org/ -- their digital forensics and investigations
>> pages have a lot of good info on the methods and actors for some types of
>> attacks -- we experienced this flavour, in particular:
>>
>>
>> https://www.qurium.org/weaponizing-proxy-and-vpn-providers/fineproxy-rayobyte/
>>
>> Finding this site helped confirm a lot of information I'd found over the
>> previous couple of years, studying these things on my own.
>>
>> https://www.radb.net/ -- you can query this for free, and it's a good
>> way to look up network information without having to bounce around between
>> ARIN, RIPE, APNIC, and other RIRs (Regional Internet Registries). You can
>> do advanced queries against it with a Whois client, as well:
>>
>> whois -h whois.radb.net -- '-i origin AS714'
>>
>> The above command will give a list of everything originating from one of
>> Apple's ASNs (Autonomous System Numbers; these are used to help manage
>> routing). For example:
>>
>> whois -h whois.radb.net -- '-i origin AS55185'
>>
>> Gives:
>>
>> route:          209.87.62.0/24
>> origin:         AS55185
>> descr:          750 - 555 Seymour Street
>>                 Vancouver BC V6B-3H6
>>                 Canada
>> admin-c:        HOSTM458-ARIN
>> tech-c:         NOC33711-ARIN
>> mnt-by:         MNT-BC-Z
>> created:        2023-12-07T21:58:41Z
>> last-modified:  2023-12-07T21:58:41Z
>> source:         ARIN
>> rpki-ov-state:  valid
>>
>> route6:         2607:f8f0:6a0::/48
>> origin:         AS55185
>> descr:          750 - 555 Seymour Street
>>                 Vancouver BC V6B-3H6
>>                 Canada
>> admin-c:        HOSTM458-ARIN
>> tech-c:         NOC33711-ARIN
>> mnt-by:         MNT-BC
>> created:        2023-12-07T22:00:06Z
>> last-modified:  2023-12-07T22:00:06Z
>> source:         ARIN
>> rpki-ov-state:  valid
>>
>> With a bit of scripting, it's not difficult to pull out the route: and
>> route6: lines, run them through aggregate (a tool that removes duplication
>> and shadowing of lists of netblocks, giving you the shortest possible list
>> of netblocks that cover all of the provided addresses), and output them to
>> a file for validation and addition to whatever solution you're using.
>>
>> It's a huge topic, and I've already babbled long enough. I'm happy to
>> give info or lend a hand, though. It's a hard problem.
>>
>> Thank you,
>>
>> Kev
>>
>>
>> --
>> Kev Woolley (they/them)
>>
>> Gratefully acknowledging that I live and work in the unceded traditional
>> territories of the Səl̓ílwətaɬ (Tsleil-Waututh) and Sḵwx̱wú7mesh Úxwumixw.
>>
>>
>>
>> ________________________________________
>> From: JonGeorg SageLibrary via Evergreen-general <
>> [email protected]>
>> Sent: 19 March 2025 08:52
>> To: Evergreen Discussion Group
>> Cc: JonGeorg SageLibrary
>> Subject: [Evergreen-general] Bot issues
>>
>> We've been dealing with a lot of bots crawling our catalog, and
>> overwhelming our app servers.
>>
>> Are any of you having the same issue, and if so what tools are you using
>> to remedy the situation?
>>
>> We've already implemented geoblocking to limit traffic to the US and
>> Canada, after being overwhelmed by queries from overseas.
>>
>> I've been looking at bad bot blocker as an option.
>> -Jon
>> This message originated from outside the M365 organisation. Please be
>> careful with links, and don't trust messages you don't recognise.
>>
> _______________________________________________
> Evergreen-general mailing list -- [email protected]
> To unsubscribe send an email to
> [email protected]
>

_______________________________________________
Evergreen-general mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Evergreen-general] Re: Bot issues

Reply via email to