Hello Fridolin,

Yes, performance is probably important for MQ, it seems to eat a disproportionate amount of resources sometime.  :)

But like I said, our server is a beast.  In every aspect of its build, it has the best components for 2022.

Let's see about the race condition:

 * An update is done on a biblio
 * An update_elastic_index job is created in the db.
 * Its ID is pushed onto MQ
 * background_jobs_worker.pl picks up the ID from MQ
     o it goes to the DB, and finds nothing with that ID.  We get a
       pointer error (yeah, I come from C)
     o This is NOT an old forgotten floating job, since we can see the
       job in the database when looking manually.
     o The jobs stays there forever, with status 'new'.
 * If I add a "sleep 1", this issue _mostly_ disappear.

There's no server performance that could explain this.  Maybe some DB caching ?


Philippe Blouin,
Directeur de la technologie

Tél.  : (833) 465-4276, poste 230
philippe.blo...@inlibro.com

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
On 2022-12-21 13:06, Fridolin SOMERS wrote:
Hi,

I think network performance is really important for RabbitMQ.
We at Biblibre have a virtual machine in each server, to share between each virtual machine (one Koha per machine) but keep a good network performance.
Looks to work well, but we are still in 21.11.

Best regards,

Le 20/12/2022 à 09:13, Philippe Blouin a écrit :
Howdy!

Since moving a lot of our users to 22.05.06, we've installed the worker everywhere.  But the number of issues encountered is staggering.

The first one was

Can't call method "process" on an undefined value

where the id received from MQ was not found in the DB, and the process is going straight to process_job and failing. Absolutely no idea how that occurs, seems completely counterintuitive (the ID comes from the DB after all), but here it is.  Hacked the code to add a "sleep 1" to fix most of that one.

Then came the fact that stored events were not checked if the connection to MQ was successful at startup.  Bug 30654 refers it. Hacked a little "$init" in there to clear that up at startup.

Then came the

malformed UTF-8 character in JSON string, at character offset 296 (before "\x{e9}serv\x{e9} au ...")

at decode_json that crashes the whole process.  And for some reason, it never gets over it, gets the same problem at every restart, like the event is never "eaten" from the queue.  Hacked an eval then a try-catch over it...

After coding a monitor to alert when a background_jobs has been "new" over 5 minutes in the DB, I was inundated by messages. There's alway one elasticsearch_update that escapes among the flurry, and they slowly add up.

At this point, the only viable solution is to run the workers but disable RabbitMQ everywhere.  Are we really the only ones experiencing that?

Regards,

PS Our servers are well-above-average Debian 11 machines with lot of firepower (ram, cpu, i/o...).

--
Philippe Blouin,
Directeur de la technologie

Tél.  : (833) 465-4276, poste 230
philippe.blo...@inlibro.com

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/

Reply via email to