HI Michael, That opac-search.pl log looks like a bot that is really stuck. That looks like it would yield an ever growing search query.
Yeah, most bots don't honour robots.txt anymore it seems. Blocking bots that self-identify with user agent strings is certainly useful, but it wouldn't block all the bots. Lyrasis has put together some good info on AI harvesting bots for instance: https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources David Cook Senior Software Engineer Prosentient Systems Suite 7.03 6a Glen St Milsons Point NSW 2061 Australia Office: 02 9212 0899 -----Original Message----- From: Michael Kuhn <m...@adminkuhn.ch> Sent: Wednesday, 12 March 2025 8:08 AM To: David Cook <dc...@prosentient.com.au>; 'Koha-devel' <koha-devel@lists.koha-community.org> Subject: Re: [Koha-devel] oom-killer / Out of memory: Killed process 1026641 (/usr/share/koha) Hi David Today you wrote: > I actually just discovered a bug with /cgi-bin/koha/patroncards/ > > create-pdf.pl which I'll be working on fixing today which could cause > > resource exhaustion. I had some Starman workers using obscene amounts > of > memory and CPU, and it's because they got trapped in infinite loops > > trying to create labels/cards. Fun times... It's not exactly the same but regarding infinite loops I sometimes find queries like the following in "plack.log" - they look very strange to me and I don't know what they are trying to do: 18.211.148.239 - - [11/Dec/2024:18:52:38 +0100] "GET /opac/opac-search.pl?count=20&limit=su-to:Verwaltung&q=ccl%3Dti,phr:(%22Agenda%22)%20and%20su-to:Berlin%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021%20and%20itype:BUCH%20and%20itype:BUCH%20and%20su-to:Lokale%20Agenda%2021%20and%20((%20(allrecords,AlwaysMatches%3D%27%27)%20and%20(not-onloan-count,st-numeric%20%3E%3D%201)%20and%20(lost,st-numeric%3D0)%20))%20and%20su-to:Lokale%20Agenda%2021%20and%20su-to:Lokale%20Agenda%2021&sort_by=relevance HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)" > But in your case... I'm guessing it's probably due to bots. I've > noticed > Starman instances that get a lot of bot hits balloon in memory > usage. You > might want to look at your "plack_max_requests" in koha- > conf.xml. Once > the Starman worker reaches its max requests, it'll be > killed off, and the > memory released. But then there's some CPU > overhead to starting up a new > Starman worker process. So there's a > cost/benefit to do there. That's the path I followed! plack_workers and plack_max_requests both showed the default value, I have now increased the value of tag plack_workers from 2 to 4 (the CPU core count) even if it doesn't seem to have boosted the page load. Sadly, "robots.txt" doesn't really seem to impress most bot behavior. What really helped was when I blocked all the bots I could find in "plack.log" (ahrefs|Amazonbot|bingbot|ClaudeBot|DotBot|Googlebot|GPTBot|meta-externalagent) using mod_rewrite. This stopped all the bot queries immediately, thus also reduced the memory usage, and at least for now Koha seems to work all right again. > It can be tough troubleshooting these things after the fact, so I'd > > suggest putting in some monitoring, which alerts you once your memory > > usage starts getting high. That way you can troubleshoot it more in > real > time. That said, troubleshooting memory use can be tricky... As an addition I have written a small script which will send me an e-mail if file "/var/log/syslog" contains a new message regarding "Out of memory". Many thanks for your suggestions! Best wishes: Michael -- Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch _______________________________________________ Koha-devel mailing list Koha-devel@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel website : https://www.koha-community.org/ git : https://git.koha-community.org/ bugs : https://bugs.koha-community.org/