Hi, On 30.09.20 08:09, Alexandre DERUMIER wrote: > some news, my last test is running for 14h now, and I don't have had any > problem :) >
great! Thanks for all your testing time, this would have been much harder, if even possible at all, without you probiving so much testing effort on a production(!) cluster - appreciated! Naturally many thanks to Fabian too, for reading so many logs without going insane :-) > So, it seem that is indeed fixed ! Congratulations ! > honza comfirmed Fabians suspicion about lacking guarantees of thread safety for cpg_mcast_joined, which was sadly not documented, so this is surely a bug, let's hope the last of such hard to reproduce ones. > > > I wonder if it could be related to this forum user > https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/ > > His problem is that after corosync lag (he's have 1 cluster stretch on 2DC > with 10km distance, so I think sometimes he's having some small lag, > 1 node is flooding other nodes with a lot of udp packets. (and making things > worst, as corosync cpu is going to 100% / overloaded, and then can't see > other onodes I can imagine this problem showing up as a a side effect of a flood where partition changes happen. Not so sure that this can be the cause of that directly. > > I had this problem 6month ago after shutting down a node, that's why I'm > thinking it could "maybe" related. > > So, I wonder if it could be same pmxcfs bug, when something looping or send > again again packets. > > The forum user seem to have the problem multiple times in some week, so maybe > he'll be able to test the new fixed pmxcs, and tell us if it's fixing this > bug too. Testing once available would be sure a good idea for them. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel