Hi, I have done a new test, this time with "systemctl stop corosync", wait 15s, "systemctl start corosync", wait 15s.
I was able to reproduce it at corosync stop on node1, 1second later /etc/pve was locked on all other nodes. I have started corosync 10min later on node1, and /etc/pve has become writeable again on all nodes node1: corosync stop: 01:26:50 node2 : /etc/pve locked : 01:26:51 http://odisoweb1.odiso.net/corosync-stop.log pmxcfs : bt full all threads: https://gist.github.com/aderumier/c45af4ee73b80330367e416af858bc65 pmxcfs: coredump :http://odisoweb1.odiso.net/core.17995.gz node1:corosync start: 01:35:36 http://odisoweb1.odiso.net/corosync-start.log BTW, I have been contacted in pm on the forum by a user following this mailing thread, and he had exactly the same problem with a 7 nodes cluster recently. (shutting down 1 node, /etc/pve was locked until the node was restarted) ----- Mail original ----- De: "Thomas Lamprecht" <t.lampre...@proxmox.com> À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "aderumier" <aderum...@odiso.com> Envoyé: Jeudi 17 Septembre 2020 13:35:55 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown On 9/17/20 12:02 PM, Alexandre DERUMIER wrote: > if needed, here my test script to reproduce it thanks, I'm now using this specific one, had a similar (but all nodes writes) running here since ~ two hours without luck yet, lets see how this behaves. > > node1 (restart corosync until node2 don't send the timestamp anymore) > ----- > > #!/bin/bash > > for i in `seq 10000`; do > now=$(date +"%T") > echo "restart corosync : $now" > systemctl restart corosync > for j in {1..59}; do > last=$(cat /tmp/timestamp) > curr=`date '+%s'` > diff=$(($curr - $last)) > if [ $diff -gt 20 ]; then > echo "too old" > exit 0 > fi > sleep 1 > done > done > > > > node2 (write to /etc/pve/test each second, then send the last timestamp to > node1) > ----- > #!/bin/bash > for i in {1..10000}; > do > now=$(date +"%T") > echo "Current time : $now" > curr=`date '+%s'` > ssh root@node1 "echo $curr > /tmp/timestamp" > echo "test" > /etc/pve/test > sleep 1 > done > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel