Hi, On Mon, Jan 26, 2009 at 03:03:44PM +0100, Peter Luciak wrote: > Hello all, > > I have a 2-node, STONITH-enabled cluster with v1 config, which is working > pretty fine, including the fencing. The fencing is done via IPMI to a > remote management controller on the motherboard of the server, which is > connected to a switch. When I kill the heartbeat master process on one > node, it is rebooted by the other node. > > However, when I perform the "hard takeover" test by unplugging the cables > on the server and leaving them unplugged, the cluster falls into a loop, > because the IPMI reboot fails to succeed (due to the second server being > unplugged) and tries again and again. The logs state that the ipmitool > process returned with error code 256. > > I know this issue has been discussed here before, and the conclusion was > that this is basically an unsolvable situation. I am wondering if anyone > else has faced this problem and has come up with a solution? Is it possible > to workaround it by using a timeout for ipmitool or something similar?
The point of fencing is to make sure that a node cannot run resources. If the fencing operation cannot be carried out, then we simple can't know what's going on. How can timeout or anything else help in this situation? > The whole cluster is on a UPS, so the "complete power outage" scenario is > unlikely, but still, this seems to me like a loophole in the STONITH > design. For now, I have simply disabled it. If you've disabled stonith, that's really not recommendable. Thanks, Dejan > Regards, > Peter > -- > Peter LUCIAK ([email protected]) > IBL Software Engineering, http://www.iblsoft.com/ > Mierov?? 103, 82105 Bratislava, Slovakia > Phone: +421-2-32662111, Fax: +421-2-32662110 > Direct: +421-2-32662175 > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
