> [...] fairly typical load but I've documented surges upwards of 2500+ > new TCP connections per second; I typically end up banning an entire > /16 or two to recover my VM when it happens.
One of the front-runners in my mind for why I'm not being DDoSed similarly is that my main house router has a reject list that blocks misbehaving IPs automatically for a week (currently 16077 IPs, typical these days). My border router also has a different list, manually maintained, which blocks netblocks in three broad categories: (1) Blocks which appear to think there is such a thing as (in the words of one netblock's remarks) "scanning for LEGIT purposes". Perhaps the most notable on this list is UCBerkeley(!). (2) Blocks which appear to be "please volunteer _your_ resources to improve _our_ commercial offerings" outfits. An example is deepfield.net. (3) Other bad actors. An example is Digital Ocean, which apparently can't be bothered to staff their abuse desk concomitant with the level of abuse they emit (ie, trying to get the rest of the net to take on some of the costs of their abuse desk - their abuse autoresponse indicates that abuse reports not formatted to specs they can't be bothered to even point to an explanation of aren't read; they handwave "tools such as ..."). Identifiable LLM crawlers would fall into (2) and, in the problematic cases at hand (misrepresent themselves, no rate limiting, scattershot from addresses, etc), (3). This list covers 159611 IPs; its minimal CIDR representation is 52 blocks. (That's for IPv4. The IPv6 list is 20 CIDR blocks covering 795088750969575521173945450512 IPs, not a useful number; it consists of a /29 and two /32s, with everything else down in the noise: eight /40s, a /44, two /48s, four /64, and a /124.) Actually, most offenders of type (1) usually just go into the automated list, because I don't use the top and bottom addresses of my netblock for anything but scanner sentinels; anyone trying to access them goes into the automated list. Most address-range scanners hit this. Only the ones that are visible enough to get human handling ever go into the manually-maintained list. Another possible reason is that I don't speak HTTPS; I consider it plausble the LLM scrapers have drunk the "HTTPS is the One True Way" koolaid and aren't even trying HTTP. Some of the port-80 connections that proceed to send me binary garbage may be attempts to initiate HTTPS (even though it's the HTTP port); whatever they are, they get dropped into the automated ban list along with anything else sending something I don't recognize in the position of an HTTP verb. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B