Le 09/10/2015 18:36, Gilou a écrit : > Le 09/10/2015 18:21, Dietmar Maurer a écrit : >>> So I tried again.. HA doesn't work. >>> Both resources are now frozen (?), and they didn't restart... Even after >>> 5 minutes... >>> service vm:102 (pve1, freeze) >>> service vm:303 (pve1, freeze) >> >> The question is why they are frozen. The only action which >> puts them to 'freeze' is when you shutdown a node. >> > > I pulled the ethernet cables out of the to-be-failing node when I > tested. It didn't shut down. I plugged them back in 20 minutes later. > They were down (so I guess the fencing worked). But still? >
OK, so I reinstalled fresh from the PVE 4 ISO 3 nodes, that are using one single NIC to communicate with a NFS server and themselves. Cluster is up, and one VM is protected: # ha-manager status quorum OK master pve1 (active, Fri Oct 9 19:55:06 2015) lrm pve1 (active, Fri Oct 9 19:55:12 2015) lrm pve2 (active, Fri Oct 9 19:55:07 2015) lrm pve3 (active, Fri Oct 9 19:55:10 2015) service vm:100 (pve2, started) # pvecm status Quorum information ------------------ Date: Fri Oct 9 19:55:22 2015 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 0x00000001 Ring ID: 12 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000002 1 192.168.44.129 0x00000003 1 192.168.44.132 0x00000001 1 192.168.44.143 (local) One one of the nodes, incidentally, the one running the HA VM, I already get those: Oct 09 19:55:07 pve2 pve-ha-lrm[1211]: watchdog update failed - Broken pipe Not good. I tried to migrate to pve1 to see what happens: Executing HA migrate for VM 100 to node pve1 unable to open file '/etc/pve/ha/crm_commands.tmp.3377' - No such file or directory TASK ERROR: command 'ha-manager migrate vm:100 pve1' failed: exit code 2 OK.. so we can't migrate running HA VMs ? What did I get wrong here? So. I remove the VM from HA, I migrate it on pve1, see what happens. It works. OK. I stop the VM. Enable HA. It won't start. service vm:100 (pve1, freeze) OK. And now, on pve1: Oct 09 19:59:16 pve1 pve-ha-crm[1202]: watchdog update failed - Broken pipe OK... Let's try pve3, cold migrate, without ha, enable ha again.. interesting, now we have: # ha-manager status quorum OK master pve1 (active, Fri Oct 9 20:09:46 2015) lrm pve1 (old timestamp - dead?, Fri Oct 9 19:58:57 2015) lrm pve2 (active, Fri Oct 9 20:09:47 2015) lrm pve3 (active, Fri Oct 9 20:09:50 2015) service vm:100 (pve3, started) Why is pve1 not reporting properly... And now on 3 nodes: Oct 09 20:10:40 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe Oct 09 20:10:50 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe Oct 09 20:11:00 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe Wtf? omping reports multicast is getting through, but I'm not sure what would be the issue there... It worked on 3.4 on the same physical setup. So ? _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel