When processing the queue, it seems the donor is blocking all queries. At least 
that’s what it looked like, but maybe it’s just even more slow.

I’m not sure what the cause is, but only notice this problem after a server 
restart, be it a SST or an IST.

We have scripts running every 1, 2, 3 and 5 minutes that process data on the 
DB, and for like 30 minutes after a server start, I have to kill them or 
disable cron altogether to avoid worsening the issue. At some point I believed 
that was enough to cope with this slowness, but fact is it’s not.

We are processing between 5.000 to 10.000 queries per seconds. In “normal 
circumstances”, a single server is enough. But upon a single server start, even 
4 servers are not handling the delays. If I restart 2 servers, the issue is 
even more dramatic.

Sorry I’m trying to focus on the cause, but apart from restarting a server 
there is no other cause for the issue.

I did an update of Ubuntu from 21.10 to 22.04 at night 3 days ago, and they all 
did an IST, but still the slowness occurred, even though there is little 
traffic at night. Something like 2000 queries per seconds. All 4 servers 
ended-up with 100+ queries stuck for entire minutes!


Is there a way to avoid dramatic slowness on server start?

I’ve read about optimizer_search_depth which could cause slow query when 
different than 0, but regardless of the value I set for it, the issue is 
exactly the same, so it’s currently set to 0.


De : William Edwards <wedwa...@cyberfusion.nl>
Envoyé : mercredi 27 juillet 2022 12:45
À : Cédric Counotte <cedric.couno...@1check.com>
Cc : maria-discuss@lists.launchpad.net
Objet : Re: [Maria-discuss] MariaDB server horribly slow on start

Hi,
Op 27 jul. 2022 om 12:37 heeft Cédric Counotte 
<cedric.couno...@1check.com<mailto:cedric.couno...@1check.com>> het volgende 
geschreven:

Thanks for your reply !

If the server does an SST, the problem is way more dramatic than when it does 
an IST.

This morning one server crashed and upon restarting it did an SST instead of an 
IST, and the issue was horrible.
Even before being available, it blocked the donor for 15 minutes with something 
like those:

2022-07-27 12:02:42 7 [Note] WSREP: Processing event queue:... 20.9% ( 496/2376 
events) complete.

Does the issue occur while these messages are logged?

For a while it got even slower to process the queue than the queue was 
increasing.

The same server crashed again so I started another one and it did an SST, but 
the problem was not as dramatic, however the processing even queue lasted 5 
minutes and blocked the donor completed for that time. In very rare occasions 
the SST is not causing such issues, but very rare (twice in 6 months and 2 or 3 
dozen of issue occurrences) and I didn’t change any settings since!? Very 
confusing.

When servers do an SST, I usually kill the CHECK TABLE FOR UPGRADE that occurs 
as it appears to slow things down even more.

Noticeably this morning I had 3 servers running, one went haywire, and caused 
another one to go down! Ended-up with a single server I had to restart caused 
it would complain about not being wsrep ready.


It’s been a very bad day today as those 4 servers are in production and we 
received dozens of calls from our customers.

Again, I’d focus on cause. The effect is clear.


Now I’m back with 2 servers and will wait tonight to restart the 2 others 
because of that issue.

IMO it’s a bug as in very rare occasions it starts smoothly. But still I found 
galera to be unreliable and my company is asking me to install a more reliable 
solution ASAP or we will loose customers! So any help would be much appreciated.

Whether something’s a bug is not an opinion.

I’m thinking of using 3 servers with replication instead, keeping load 
balancing using source Ips, but I’m worried that this might be less reliable. 
We have 2 spare servers in another location, synched with replication but it 
happened too often that upon a server crash the replication would no longer 
start and had to be entirely restarted which shows as not being even less 
reliable.

Sorry for the long story, but I’m no Galera expert

Then you could indeed wonder if your company should be using Galera …

and I’m having lots of issues I can’t find any info or solution about.

This is another issue I’m facing with replication, while it seems to be caused 
by galera cluster: https://jira.mariadb.org/browse/MDEV-29132



De : William Edwards <wedwa...@cyberfusion.nl<mailto:wedwa...@cyberfusion.nl>>
Envoyé : mercredi 27 juillet 2022 11:58
À : Cédric Counotte 
<cedric.couno...@1check.com<mailto:cedric.couno...@1check.com>>
Cc : maria-discuss@lists.launchpad.net<mailto:maria-discuss@lists.launchpad.net>
Objet : Re: [Maria-discuss] MariaDB server horribly slow on start


Op 27 jul. 2022 om 11:46 heeft Cédric Counotte 
<cedric.couno...@1check.com<mailto:cedric.couno...@1check.com>> het volgende 
geschreven:


Hello all. I hope I’m at the right place to ask this question.

I opened a bug here: https://jira.mariadb.org/browse/MDEV-28969, however I was 
told to use this mailing list.



We have 4 MariaDB servers in a Galera Cluster and it happens that a server has 
to be restarted (be it for a crash which I have to open a bug for) or 
maintenance.



When that happens, the restarted server is causing huge slow down on the whole 
cluster, and it lasts for 10 to 30 minutes at the very least!



And by huge, I mean huge, we end up with 500 to 800 pending queries on all 
servers as you can see on attached screenshots

I’ve attached the configuration of any server for reference in case this is the 
source of the issue.



Any way to solve this would be greatly appreciated.

You seem to be focusing on effect. What is the cause? SST?




Regards,

3C.
[image001.png]
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : 
maria-discuss@lists.launchpad.net<mailto:maria-discuss@lists.launchpad.net>
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to