On Fri, Jul 04, 2025 at 10:40:22AM +0200, Felix Schwarz wrote:
> 
> Am 04.07.25 um 10:19 schrieb Michael J Gruber:
> > I was wondering what other websites do. I mean, Fedora's are certainly
> > not the only ones being AI-scraped, and I hadn't heard of that being an
> > issue before. So there have to be practical solutions.
> 
> This is a massive issue for all websites which serve some dynamic content
> and which get at least a bit of traffic.

You don't need much traffic for that to happen.  I had a small git
server with some larger projects (linux kernel, qemu) + cgit setup for
browsing those.  Not any more.

And even weeks after taking down cgit I have days where apache serves
more than a million 404 pages (which luckily needs much less resources
than running cgit so that doesn't take out the server).

> Basically these AI scrapers do not care about any restrictions like
> robots.txt or whatever. They try access all pages and do so with ridiculous
> frequency.

I'd name that DoS.

take care,
  Gerd

-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to