Hi David

On 11 March you wrote:

> That opac-search.pl log looks like a bot that is really stuck. That
> looks like it would yield an ever growing search query.
>
> Yeah, most bots don't honour robots.txt anymore it seems.
>
> Blocking bots that self-identify with user agent strings is certainly
> useful, but it wouldn't block all the bots. Lyrasis has put together
> some good info on AI harvesting bots for instance:
> * https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources

Many thanks for this link which gives me some better insight since many of these these bots get even more annoying tahn they already were.

Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch



-----Original Message-----
From: Michael Kuhn <m...@adminkuhn.ch>
Sent: Wednesday, 12 March 2025 8:08 AM
To: David Cook <dc...@prosentient.com.au>; 'Koha-devel' 
<koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] oom-killer / Out of memory: Killed process 1026641 
(/usr/share/koha)

Hi David

Today you wrote:

  > I actually just discovered a bug with /cgi-bin/koha/patroncards/  > create-pdf.pl 
which I'll be working on fixing today which could cause  > resource exhaustion. I had some 
Starman workers using obscene amounts  > of memory and CPU, and it's because they got 
trapped in infinite loops  > trying to create labels/cards. Fun times...

It's not exactly the same but regarding infinite loops I sometimes find queries like the 
following in "plack.log" - they look very strange to me and I don't know what 
they are trying to do:

18.211.148.239 - - [11/Dec/2024:18:52:38 +0100] "GET 
/opac/opac-search.pl?count=20&limit=su-to:Verwaltung&q=ccl%3Dti,phr:(%22Agenda%22)%20and%20su-to:Berlin%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20itype:BUCH%20and%20su-to:Lokale%20Agenda%2021%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021&sort_by=relevance
HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; 
+https://developer.amazon.com/support/amazonbot)"

  > But in your case... I'm guessing it's probably due to bots. I've  > noticed Starman instances that 
get a lot of bot hits balloon in memory  > usage. You might want to look at your 
"plack_max_requests" in koha-  > conf.xml. Once the Starman worker reaches its max requests, 
it'll be  > killed off, and the memory released. But then there's some CPU  > overhead to starting up a 
new Starman worker process. So there's a  > cost/benefit to do there.

That's the path I followed! plack_workers and plack_max_requests both showed 
the default value, I have now increased the value of tag plack_workers from 2 
to 4 (the CPU core count) even if it doesn't seem to have boosted the page load.

Sadly, "robots.txt" doesn't really seem to impress most bot behavior.

What really helped was when I blocked all the bots I could find in "plack.log"
(ahrefs|Amazonbot|bingbot|ClaudeBot|DotBot|Googlebot|GPTBot|meta-externalagent)
using mod_rewrite. This stopped all the bot queries immediately, thus also 
reduced the memory usage, and at least for now Koha seems to work all right 
again.

  > It can be tough troubleshooting these things after the fact, so I'd  > suggest 
putting in some monitoring, which alerts you once your memory  > usage starts getting 
high. That way you can troubleshoot it more in  > real time. That said, troubleshooting 
memory use can be tricky...

As an addition I have written a small script which will send me an e-mail if file 
"/var/log/syslog" contains a new message regarding "Out of memory".

Many thanks for your suggestions!

Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin 
Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 
· E m...@adminkuhn.ch · W www.adminkuhn.ch




_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

Reply via email to