On Fri, 2025-07-04 at 12:48 +0200, Marius Schwarz wrote:
> Am 04.07.25 um 12:00 schrieb Gerd Hoffmann:
> > > Basically these AI scrapers do not care about any restrictions like
> > > robots.txt or whatever. They try access all pages and do so with 
> > > ridiculous
> > > frequency.
> > I'd name that DoS.
> 
> "Do not talk with terrorists" .. block theire entire networks. It's what 
> we in our datacenter do.
> Somethimes they use a Class C for the GET request and another Class C 
> for the POST request ( in our case a WP cluster )
> 
> We stopped blocking ip by ip, we use /24 blocks now.
> 
> Theire entire buisness modell is based on our data , so if we stop that 
> data flow, we hit them in the long term.

But then you'll be blocking tons of innocent users, for the reason Tom
noted:

"The problem is that isn't a few big netblocks from big AI companies,
as they are relatively easy to deal with, rather it's fly by night
outfits scraping using rented proxy networks so the IPs are all over
the place."

The most problematic scraping isn't coming from easily-identifiable
corporate networks owned by the scrapers. It's coming, essentially,
from rented botnets. People used to rent botnets to send spam, now
they're renting them to do scraping (and presumably sell the resulting
data to middleman outfits who can launder it to the big AI outfits
while everyone gets to maintain plausible deniability about where it
came from).

The individual hosts in these botnets are just regular people on normal
residential or cellular networks, so if you block the entire network,
you just blocked 10,000 regular people from visiting your site.

If you're not going to use something like Cloudflare or Anubis
sometimes you do *have* to do this just to keep the site up - we have
blocked the entirety of Brazil from Fedora infra a couple of times so
far (since, as Jelle noted, for some reason a lot of this traffic comes
from Brazil) - but it's not exactly "optimal". There really aren't any
good choices here.
-- 
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @ad...@fosstodon.org
https://www.happyassassin.net



-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to