Hi,

some news, my last test is running for 14h now, and I don't have had any 
problem :)

So, it seem that is indeed fixed ! Congratulations !



I wonder if it could be related to this forum user
https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/

His problem is that after corosync lag (he's have 1 cluster stretch on 2DC with 
10km distance, so I think sometimes he's having some small lag,
1 node is flooding other nodes with a lot of udp packets. (and making things 
worst, as corosync cpu is going to 100% / overloaded, and then can't see other 
onodes

I had this problem 6month ago after shutting down a node, that's why I'm 
thinking it could "maybe" related.

So, I wonder if it could be same pmxcfs bug, when something looping or send 
again again packets.

The forum user seem to have the problem multiple times in some week, so maybe 
he'll be able to test the new fixed pmxcs, and tell us if it's fixing this bug 
too.



----- Mail original -----
De: "aderumier" <aderum...@odiso.com>
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Envoyé: Mardi 29 Septembre 2020 15:52:18
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

>>huge thanks for all the work on this btw! 

huge thanks to you ! ;) 


>>I think I've found a likely culprit (a missing lock around a 
>>non-thread-safe corosync library call) based on the last logs (which 
>>were now finally complete!). 

YES :) 


>>if feedback from your end is positive, I'll whip up a proper patch 
>>tomorrow or on Thursday. 

I'm going to launch a new test right now ! 


----- Mail original ----- 
De: "Fabian Grünbichler" <f.gruenbich...@proxmox.com> 
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com> 
Envoyé: Mardi 29 Septembre 2020 15:28:19 
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown 

huge thanks for all the work on this btw! 

I think I've found a likely culprit (a missing lock around a 
non-thread-safe corosync library call) based on the last logs (which 
were now finally complete!). 

rebuilt packages with a proof-of-concept-fix: 

23b03a48d3aa9c14e86fe8cf9bbb7b00bd8fe9483084b9e0fd75fd67f29f10bec00e317e2a66758713050f36c165d72f107ee3449f9efeb842d3a57c25f8bca7
 pve-cluster_6.1-8_amd64.deb 
9e1addd676513b176f5afb67cc6d85630e7da9bbbf63562421b4fd2a3916b3b2af922df555059b99f8b0b9e64171101a1c9973846e25f9144ded9d487450baef
 pve-cluster-dbgsym_6.1-8_amd64.deb 

I removed some logging statements which are no longer needed, so output 
is a bit less verbose again. if you are not able to trigger the issue 
with this package, feel free to remove the -debug and let it run for a 
little longer without the massive logs. 

if feedback from your end is positive, I'll whip up a proper patch 
tomorrow or on Thursday. 


_______________________________________________ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


_______________________________________________ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to