Hi Christina
You wrote:
> Koha 24.11.01
>
> Not strictly a Koha problem but something I know a lot of Koha users
> face. After years of running happily with fail2ban and robots.txt
> blocking bots/crawlers, the security seems to have passed. We've been
> getting more and more bots of late switching IPs before bans can take
> place, perhaps they could be ddos, either way grinding koha to a halt.
> I've had to switch OPACPublic to disable for now. I can't find much
> about securing a server against these types of hits. Does anyone else
> running a small server have any guidance on what could be done/the
> next steps? I'd ideally like to keep the OPAC public.
I recently opened a thread in the mailinglist "koha-devel" dealing with
very similar behaviour which led to out of memory errors which caused
Koha to exit:
*
https://lists.koha-community.org/pipermail/koha-devel/2025-March/048775.html
The following article (provided by David Cook) gives some insight it
what actually may be happening:
*
https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources
In my case regarding bots I don't rely on fail2ban and "robots.txt"
anymore. There are way to many everchanging IP addresses while
"robots.txt" just seems to get ignored.
Instead what I did is the following:
1. In log file "/var/log/koha/<instancename>/plack.log" I investigated
the user agent strings of suspicious bots. I did this for three
libraries and I came up with the strings you'll find below.
Of course there may be more such bots. Also it seems some bots have even
more wicked ways to harrass the OPAC.
2. In configuration file
"/etc/apache2/sites-available/<instancename>.conf" I added the following
after the directive <VirtualHost *:443> which serves the Koha OPAC
(these are three lines):
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}
(ahrefs|Amazonbot|applebot|bingbot|CensysInspect|ChatGPT|ClaudeBot|Custom-AsyncHttpClient|DotBot|DuckDuckBot|Go-http-client|Googlebot|GoogleOther|GPTBot|l9explore|meta-externalagent|MJ12bot|MetaJobBot|OAI-SearchBot|Odin|PerplexityBot|PetalBot|Qwantbot|SemrushBot|Turnitin)
[NC]
RewriteRule ^(.*)$ - [F,L]
After inserting these lines I restarted the Apache HTTP Server.
3. This is not a perfect solution (read the article I linked above) but
at least the performance has gotten so much better by this immediately.
And the bots identified by the given strings are definitely locked out.
Hope this helps.
Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch
_______________________________________________
Koha mailing list http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha