here the previous restart log node1 -> corosync restart at 10:46:15 ----- https://gist.github.com/aderumier/0992051d20f51270ceceb5b3431d18d7
node2 ----- https://gist.github.com/aderumier/eea0c50fefc1d8561868576f417191ba node5 ------ https://gist.github.com/aderumier/f2ce1bc5a93827045a5691583bbc7a37 ----- Mail original ----- De: "Thomas Lamprecht" <t.lampre...@proxmox.com> À: "aderumier" <aderum...@odiso.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com> Cc: "dietmar" <diet...@proxmox.com> Envoyé: Mardi 15 Septembre 2020 11:46:51 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On 9/15/20 11:35 AM, Alexandre DERUMIER wrote: > Hi, > > I have finally reproduce it ! > > But this is with a corosync restart in cron each 1 minute, on node1 > > Then: lrm was stuck for too long for around 60s and softdog have been > triggered on multiple other nodes. > > here the logs with full corosync debug at the time of last corosync restart. > > node1 (where corosync is restarted each minute) > https://gist.github.com/aderumier/c4f192fbce8e96759f91a61906db514e > > node2 > https://gist.github.com/aderumier/2d35ea05c1fbff163652e564fc430e67 > > node5 > https://gist.github.com/aderumier/df1d91cddbb6e15bb0d0193ed8df9273 > > I'll prepare logs from the previous corosync restart, as the lrm seem to be > already stuck before. Yeah that would be good, as yes the lrm seems to get stuck at around 10:46:21 > Sep 15 10:47:26 m6kvm2 pve-ha-lrm[3736]: loop take too long (65 seconds) _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel