Hi,

On Mon, Jan 26, 2009 at 03:03:44PM +0100, Peter Luciak wrote:
> Hello all,
>
> I have a 2-node, STONITH-enabled cluster with v1 config, which is working 
> pretty fine, including the fencing. The fencing is done via IPMI to a 
> remote management controller on the motherboard of the server, which is 
> connected to a switch. When I kill the heartbeat master process on one 
> node, it is rebooted by the other node.
>
> However, when I perform the "hard takeover" test by unplugging the cables 
> on the server and leaving them unplugged, the cluster falls into a loop, 
> because the IPMI reboot fails to succeed (due to the second server being 
> unplugged) and tries again and again. The logs state that the ipmitool 
> process returned with error code 256.
>
> I know this issue has been discussed here before, and the conclusion was 
> that this is basically an unsolvable situation. I am wondering if anyone 
> else has faced this problem and has come up with a solution? Is it possible 
> to workaround it by using a timeout for ipmitool or something similar?

The point of fencing is to make sure that a node cannot run
resources. If the fencing operation cannot be carried out, then
we simple can't know what's going on. How can timeout or anything
else help in this situation?

> The whole cluster is on a UPS, so the "complete power outage" scenario is 
> unlikely, but still, this seems to me like a loophole in the STONITH 
> design. For now, I have simply disabled it.

If you've disabled stonith, that's really not recommendable.

Thanks,

Dejan

> Regards,
> Peter
> -- 
> Peter LUCIAK ([email protected])
> IBL Software Engineering, http://www.iblsoft.com/
> Mierov?? 103, 82105 Bratislava, Slovakia
> Phone: +421-2-32662111, Fax: +421-2-32662110
> Direct: +421-2-32662175
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to