Hi David
On 11 March you wrote:
> That opac-search.pl log looks like a bot that is really stuck. That
> looks like it would yield an ever growing search query.
>
> Yeah, most bots don't honour robots.txt anymore it seems.
>
> Blocking bots that self-identify with user agent strings is certainly
> useful, but it wouldn't block all the bots. Lyrasis has put together
> some good info on AI harvesting bots for instance:
> *
https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources
Many thanks for this link which gives me some better insight since many
of these these bots get even more annoying tahn they already were.
Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch
-----Original Message-----
From: Michael Kuhn <m...@adminkuhn.ch>
Sent: Wednesday, 12 March 2025 8:08 AM
To: David Cook <dc...@prosentient.com.au>; 'Koha-devel'
<koha-devel@lists.koha-community.org>
Subject: Re: [Koha-devel] oom-killer / Out of memory: Killed process 1026641
(/usr/share/koha)
Hi David
Today you wrote:
> I actually just discovered a bug with /cgi-bin/koha/patroncards/ > create-pdf.pl
which I'll be working on fixing today which could cause > resource exhaustion. I had some
Starman workers using obscene amounts > of memory and CPU, and it's because they got
trapped in infinite loops > trying to create labels/cards. Fun times...
It's not exactly the same but regarding infinite loops I sometimes find queries like the
following in "plack.log" - they look very strange to me and I don't know what
they are trying to do:
18.211.148.239 - - [11/Dec/2024:18:52:38 +0100] "GET
/opac/opac-search.pl?count=20&limit=su-to:Verwaltung&q=ccl%3Dti,phr:(%22Agenda%22)%20and%20su-to:Berlin%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20itype:BUCH%20and%20su-to:Lokale%20Agenda%2021%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021&sort_by=relevance
HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1;
+https://developer.amazon.com/support/amazonbot)"
> But in your case... I'm guessing it's probably due to bots. I've > noticed Starman instances that
get a lot of bot hits balloon in memory > usage. You might want to look at your
"plack_max_requests" in koha- > conf.xml. Once the Starman worker reaches its max requests,
it'll be > killed off, and the memory released. But then there's some CPU > overhead to starting up a
new Starman worker process. So there's a > cost/benefit to do there.
That's the path I followed! plack_workers and plack_max_requests both showed
the default value, I have now increased the value of tag plack_workers from 2
to 4 (the CPU core count) even if it doesn't seem to have boosted the page load.
Sadly, "robots.txt" doesn't really seem to impress most bot behavior.
What really helped was when I blocked all the bots I could find in "plack.log"
(ahrefs|Amazonbot|bingbot|ClaudeBot|DotBot|Googlebot|GPTBot|meta-externalagent)
using mod_rewrite. This stopped all the bot queries immediately, thus also
reduced the memory usage, and at least for now Koha seems to work all right
again.
> It can be tough troubleshooting these things after the fact, so I'd > suggest
putting in some monitoring, which alerts you once your memory > usage starts getting
high. That way you can troubleshoot it more in > real time. That said, troubleshooting
memory use can be tricky...
As an addition I have written a small script which will send me an e-mail if file
"/var/log/syslog" contains a new message regarding "Out of memory".
Many thanks for your suggestions!
Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin
Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61
· E m...@adminkuhn.ch · W www.adminkuhn.ch
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/