03.09.2013 07:04, Digimer wrote: ... > To solve problem 1, you can set a delay against one of the nodes. Say > you set the fence primitive for node 01 to have 'delay="15"'. When node > 1 goes to fence node 2, it starts immediately. When node 2 starts to > fence node 1, it sees the 15 second delay and pauses. Node 1 will power > off node 2 long before node 2 finishes the pause. You can further help > this problem by disabling acpid on the nodes. Without it, the power-off > signal from the BMC will be nearly instant, shortening up the window > where both nodes can initiate a fence.
Does anybody know for sure how and *why* does it work? I mean why disabling userspace ACPI event reader (which reads just what kernel sends after hardware events) affects how hardware behaves? > > To solve problem 2, simply disable corosync/pacemaker from starting on > boot. This way, the fenced node will be (hopefully) back up and running, > so you can ssh into it and look at what happened. It won't try to rejoin > the cluster though, so no risk of a fence loop. Enhancement to this would be enabling corosync/pacemaker back during the clean shutdown and disabling it after boot. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
