
On 30.09.20 08:09, Alexandre DERUMIER wrote:
> some news, my last test is running for 14h now, and I don't have had any 
> problem :)

great! Thanks for all your testing time, this would have been much harder,
if even possible at all, without you probiving so much testing effort on a
production(!) cluster - appreciated!

Naturally many thanks to Fabian too, for reading so many logs without going
insane :-)

> So, it seem that is indeed fixed ! Congratulations !

honza comfirmed Fabians suspicion about lacking guarantees of thread safety
for cpg_mcast_joined, which was sadly not documented, so this is surely
a bug, let's hope the last of such hard to reproduce ones.

> I wonder if it could be related to this forum user
> https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/
> His problem is that after corosync lag (he's have 1 cluster stretch on 2DC 
> with 10km distance, so I think sometimes he's having some small lag,
> 1 node is flooding other nodes with a lot of udp packets. (and making things 
> worst, as corosync cpu is going to 100% / overloaded, and then can't see 
> other onodes

I can imagine this problem showing up as a a side effect of a flood where 
changes happen. Not so sure that this can be the cause of that directly.

> I had this problem 6month ago after shutting down a node, that's why I'm 
> thinking it could "maybe" related.
> So, I wonder if it could be same pmxcfs bug, when something looping or send 
> again again packets.
> The forum user seem to have the problem multiple times in some week, so maybe 
> he'll be able to test the new fixed pmxcs, and tell us if it's fixing this 
> bug too.

Testing once available would be sure a good idea for them.

pve-devel mailing list

Reply via email to