Hi Christina

You wrote:

> Koha 24.11.01
>
> Not strictly a Koha problem but something I know a lot of Koha users
> face. After years of running happily with fail2ban and robots.txt
> blocking bots/crawlers, the security seems to have passed. We've been
> getting more and more bots of late switching IPs before bans can take
> place, perhaps they could be ddos, either way grinding koha to a halt.
> I've had to switch OPACPublic to disable for now. I can't find much
> about securing a server against these types of hits. Does anyone else
> running a small server have any guidance on what could be done/the
> next steps? I'd ideally like to keep the OPAC public.

I recently opened a thread in the mailinglist "koha-devel" dealing with very similar behaviour which led to out of memory errors which caused Koha to exit:

* https://lists.koha-community.org/pipermail/koha-devel/2025-March/048775.html

The following article (provided by David Cook) gives some insight it what actually may be happening:

* https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources

In my case regarding bots I don't rely on fail2ban and "robots.txt" anymore. There are way to many everchanging IP addresses while "robots.txt" just seems to get ignored.

Instead what I did is the following:

1. In log file "/var/log/koha/<instancename>/plack.log" I investigated the user agent strings of suspicious bots. I did this for three libraries and I came up with the strings you'll find below.

Of course there may be more such bots. Also it seems some bots have even more wicked ways to harrass the OPAC.

2. In configuration file "/etc/apache2/sites-available/<instancename>.conf" I added the following after the directive <VirtualHost *:443> which serves the Koha OPAC (these are three lines):

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} (ahrefs|Amazonbot|applebot|bingbot|CensysInspect|ChatGPT|ClaudeBot|Custom-AsyncHttpClient|DotBot|DuckDuckBot|Go-http-client|Googlebot|GoogleOther|GPTBot|l9explore|meta-externalagent|MJ12bot|MetaJobBot|OAI-SearchBot|Odin|PerplexityBot|PetalBot|Qwantbot|SemrushBot|Turnitin) [NC]

RewriteRule ^(.*)$ - [F,L]

After inserting these lines I restarted the Apache HTTP Server.

3. This is not a perfect solution (read the article I linked above) but at least the performance has gotten so much better by this immediately. And the bots identified by the given strings are definitely locked out.

Hope this helps.

Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch
_______________________________________________

Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha

Reply via email to