HI Michael,

That opac-search.pl log looks like a bot that is really stuck. That looks like 
it would yield an ever growing search query. 

Yeah, most bots don't honour robots.txt anymore it seems. 

Blocking bots that self-identify with user agent strings is certainly useful, 
but it wouldn't block all the bots. Lyrasis has put together some good info on 
AI harvesting bots for instance: 
https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources

David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia

Office: 02 9212 0899

-----Original Message-----
From: Michael Kuhn <m...@adminkuhn.ch> 
Sent: Wednesday, 12 March 2025 8:08 AM
To: David Cook <dc...@prosentient.com.au>; 'Koha-devel' 
<koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] oom-killer / Out of memory: Killed process 1026641 
(/usr/share/koha)

Hi David

Today you wrote:

 > I actually just discovered a bug with /cgi-bin/koha/patroncards/  > 
 > create-pdf.pl which I'll be working on fixing today which could cause  > 
 > resource exhaustion. I had some Starman workers using obscene amounts  > of 
 > memory and CPU, and it's because they got trapped in infinite loops  > 
 > trying to create labels/cards. Fun times...

It's not exactly the same but regarding infinite loops I sometimes find queries 
like the following in "plack.log" - they look very strange to me and I don't 
know what they are trying to do:

18.211.148.239 - - [11/Dec/2024:18:52:38 +0100] "GET 
/opac/opac-search.pl?count=20&limit=su-to:Verwaltung&q=ccl%3Dti,phr:(%22Agenda%22)%20and%20su-to:Berlin%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20itype:BUCH%20and%20su-to:Lokale%20Agenda%2021%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021&sort_by=relevance
HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 
(Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)"

 > But in your case... I'm guessing it's probably due to bots. I've  > noticed 
 > Starman instances that get a lot of bot hits balloon in memory  > usage. You 
 > might want to look at your "plack_max_requests" in koha-  > conf.xml. Once 
 > the Starman worker reaches its max requests, it'll be  > killed off, and the 
 > memory released. But then there's some CPU  > overhead to starting up a new 
 > Starman worker process. So there's a  > cost/benefit to do there.

That's the path I followed! plack_workers and plack_max_requests both showed 
the default value, I have now increased the value of tag plack_workers from 2 
to 4 (the CPU core count) even if it doesn't seem to have boosted the page load.

Sadly, "robots.txt" doesn't really seem to impress most bot behavior.

What really helped was when I blocked all the bots I could find in "plack.log" 
(ahrefs|Amazonbot|bingbot|ClaudeBot|DotBot|Googlebot|GPTBot|meta-externalagent)
using mod_rewrite. This stopped all the bot queries immediately, thus also 
reduced the memory usage, and at least for now Koha seems to work all right 
again.

 > It can be tough troubleshooting these things after the fact, so I'd  > 
 > suggest putting in some monitoring, which alerts you once your memory  > 
 > usage starts getting high. That way you can troubleshoot it more in  > real 
 > time. That said, troubleshooting memory use can be tricky...

As an addition I have written a small script which will send me an e-mail if 
file "/var/log/syslog" contains a new message regarding "Out of memory".

Many thanks for your suggestions!

Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin 
Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 
· E m...@adminkuhn.ch · W www.adminkuhn.ch

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

Reply via email to