[Evergreen-general] Re: Bot issues

Kev Woolley via Evergreen-general Wed, 19 Mar 2025 10:55:56 -0700

Hi Jon,

We use CrowdSec: https://www.crowdsec.net/


It allows you to define your own scenarios that allow you to make decisions on 
incoming traffic and automatically mitigate it via firewall or other banning 
measures, throwing up a CAPTCHA, and more.

Note that CrowdSec doesn't work in any time slice above 48 hours -- all of its 
mitigations are very short-lived. We are combining this with a substantial 
long-term blocklist (implemented as an ipset block in Linux iptables) that 
subsumes the functionality of both geo and provider blocks for longer-term 
mitigations. This is, of course, a labour-heavy endeavour, but we've tried 
several alternatives, and this is what's working best so far.

We have scenarios defined to catch useragent traits and block useragents that 
seem bad. After some initial learning ("oh, so this version of MS Office says 
it's MSIE 7.0, so a library just blocked themselves -- oops!" and similar 
situations), it was pretty easy to get most bot traffic caught in that. As I 
get time (and more familiarity with writing the scenarios) I'll be designing 
scenarios that look for specific behaviours (such as grabbing the links on a 
page in order, too quickly) and improving our defense that way.

CS offers reasonably good visualisation and reporting tools. This is useful for 
both keeping track of who's doing what, but also seeing the persistent threats 
and creating entries in the long-term blocklist for those.

My observation, even very recently as I've been working on the long-term 
blocklist and not updating it on our servers (working with ~10k rules takes a 
while), is that there really doesn't seem to be a point where one can take 
their eyes off the issue entirely and forget about it -- new traffic comes out 
of the woodwork. With a substantial enough long-term blocklist this can reduce 
the time spent to a reasonable amount, but there doesn't seem to be an "okay, 
we're done here" point.

My gut feel is that 30-50k long-term blocklist rules is where we may end up 
eventually (with some years of building them).

I'm happy to share what I've got in the LTB. It's been built over the last 
several months, based on the attacks we've received.

Resources I've found helpful include:

https://www.qurium.org/ -- their digital forensics and investigations pages 
have a lot of good info on the methods and actors for some types of attacks -- 
we experienced this flavour, in particular:

https://www.qurium.org/weaponizing-proxy-and-vpn-providers/fineproxy-rayobyte/

Finding this site helped confirm a lot of information I'd found over the 
previous couple of years, studying these things on my own. 

https://www.radb.net/ -- you can query this for free, and it's a good way to 
look up network information without having to bounce around between ARIN, RIPE, 
APNIC, and other RIRs (Regional Internet Registries). You can do advanced 
queries against it with a Whois client, as well:

whois -h whois.radb.net -- '-i origin AS714'

The above command will give a list of everything originating from one of 
Apple's ASNs (Autonomous System Numbers; these are used to help manage 
routing). For example:

whois -h whois.radb.net -- '-i origin AS55185'

Gives:

route:          209.87.62.0/24
origin:         AS55185
descr:          750 - 555 Seymour Street
                Vancouver BC V6B-3H6
                Canada
admin-c:        HOSTM458-ARIN
tech-c:         NOC33711-ARIN
mnt-by:         MNT-BC-Z
created:        2023-12-07T21:58:41Z
last-modified:  2023-12-07T21:58:41Z
source:         ARIN
rpki-ov-state:  valid

route6:         2607:f8f0:6a0::/48
origin:         AS55185
descr:          750 - 555 Seymour Street
                Vancouver BC V6B-3H6
                Canada
admin-c:        HOSTM458-ARIN
tech-c:         NOC33711-ARIN
mnt-by:         MNT-BC
created:        2023-12-07T22:00:06Z
last-modified:  2023-12-07T22:00:06Z
source:         ARIN
rpki-ov-state:  valid

With a bit of scripting, it's not difficult to pull out the route: and route6: 
lines, run them through aggregate (a tool that removes duplication and 
shadowing of lists of netblocks, giving you the shortest possible list of 
netblocks that cover all of the provided addresses), and output them to a file 
for validation and addition to whatever solution you're using.

It's a huge topic, and I've already babbled long enough. I'm happy to give info 
or lend a hand, though. It's a hard problem.

Thank you,

Kev


-- 
Kev Woolley (they/them)

Gratefully acknowledging that I live and work in the unceded traditional 
territories of the Səl̓ílwətaɬ (Tsleil-Waututh) and Sḵwx̱wú7mesh Úxwumixw.



________________________________________
From: JonGeorg SageLibrary via Evergreen-general 
<[email protected]>
Sent: 19 March 2025 08:52
To: Evergreen Discussion Group
Cc: JonGeorg SageLibrary
Subject: [Evergreen-general] Bot issues

We've been dealing with a lot of bots crawling our catalog, and overwhelming 
our app servers.

Are any of you having the same issue, and if so what tools are you using to 
remedy the situation?

We've already implemented geoblocking to limit traffic to the US and Canada, 
after being overwhelmed by queries from overseas.

I've been looking at bad bot blocker as an option.
-Jon
This message originated from outside the M365 organisation. Please be careful 
with links, and don't trust messages you don't recognise.
_______________________________________________
Evergreen-general mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Evergreen-general] Re: Bot issues

Reply via email to