On Fri, Jul 04, 2025 at 10:40:22AM +0200, Felix Schwarz wrote: > > Am 04.07.25 um 10:19 schrieb Michael J Gruber: > > I was wondering what other websites do. I mean, Fedora's are certainly > > not the only ones being AI-scraped, and I hadn't heard of that being an > > issue before. So there have to be practical solutions. > > This is a massive issue for all websites which serve some dynamic content > and which get at least a bit of traffic.
You don't need much traffic for that to happen. I had a small git server with some larger projects (linux kernel, qemu) + cgit setup for browsing those. Not any more. And even weeks after taking down cgit I have days where apache serves more than a million 404 pages (which luckily needs much less resources than running cgit so that doesn't take out the server). > Basically these AI scrapers do not care about any restrictions like > robots.txt or whatever. They try access all pages and do so with ridiculous > frequency. I'd name that DoS. take care, Gerd -- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue