On Fri, 2025-07-04 at 12:48 +0200, Marius Schwarz wrote: > Am 04.07.25 um 12:00 schrieb Gerd Hoffmann: > > > Basically these AI scrapers do not care about any restrictions like > > > robots.txt or whatever. They try access all pages and do so with > > > ridiculous > > > frequency. > > I'd name that DoS. > > "Do not talk with terrorists" .. block theire entire networks. It's what > we in our datacenter do. > Somethimes they use a Class C for the GET request and another Class C > for the POST request ( in our case a WP cluster ) > > We stopped blocking ip by ip, we use /24 blocks now. > > Theire entire buisness modell is based on our data , so if we stop that > data flow, we hit them in the long term.
But then you'll be blocking tons of innocent users, for the reason Tom noted: "The problem is that isn't a few big netblocks from big AI companies, as they are relatively easy to deal with, rather it's fly by night outfits scraping using rented proxy networks so the IPs are all over the place." The most problematic scraping isn't coming from easily-identifiable corporate networks owned by the scrapers. It's coming, essentially, from rented botnets. People used to rent botnets to send spam, now they're renting them to do scraping (and presumably sell the resulting data to middleman outfits who can launder it to the big AI outfits while everyone gets to maintain plausible deniability about where it came from). The individual hosts in these botnets are just regular people on normal residential or cellular networks, so if you block the entire network, you just blocked 10,000 regular people from visiting your site. If you're not going to use something like Cloudflare or Anubis sometimes you do *have* to do this just to keep the site up - we have blocked the entirety of Brazil from Fedora infra a couple of times so far (since, as Jelle noted, for some reason a lot of this traffic comes from Brazil) - but it's not exactly "optimal". There really aren't any good choices here. -- Adam Williamson (he/him/his) Fedora QA Fedora Chat: @adamwill:fedora.im | Mastodon: @ad...@fosstodon.org https://www.happyassassin.net -- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue