On 03/09/13 14:14, Vladislav Bogdanov wrote:
03.09.2013 07:04, Digimer wrote:
...
To solve problem 1, you can set a delay against one of the nodes. Say
you set the fence primitive for node 01 to have 'delay="15"'. When node
1 goes to fence node 2, it starts immediately. When node 2 starts to
fence node 1, it sees the 15 second delay and pauses. Node 1 will power
off node 2 long before node 2 finishes the pause. You can further help
this problem by disabling acpid on the nodes. Without it, the power-off
signal from the BMC will be nearly instant, shortening up the window
where both nodes can initiate a fence.

Does anybody know for sure how and *why* does it work? I mean why
disabling userspace ACPI event reader (which reads just what kernel
sends after hardware events) affects how hardware behaves?

Disabling acpid causes, in my experience, the node to instantly power down when it receives a power-button event. How/why this happens is probably buried in the kernel source and/or ACPI definitions.

To solve problem 2, simply disable corosync/pacemaker from starting on
boot. This way, the fenced node will be (hopefully) back up and running,
so you can ssh into it and look at what happened. It won't try to rejoin
the cluster though, so no risk of a fence loop.

Enhancement to this would be enabling corosync/pacemaker back during the
clean shutdown and disabling it after boot.

That would be a good idea, actually. I like that.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to