On 2011-10-13 23:34, Charles Richard wrote: > That does clear a few things, thanks! > > One thing I'm still not clear though is on the stonith front, i thought i > saw an article that basically said that only stonith hardware devices are > really acceptable as their power source is independant of the server. Our > servers do support IPMI but the IPMI lan communication will be cut off if > the lan interface goes down so the other node that should be shooting the > node is not able to. I might have missed something here but that's the > impression i get.
The possibility of connection loss to the fencing device is an issue you'll have to solve regardless of its type. If a "stonith hardware device", per your definition, is a PDU or power switch, then how do you think the cluster nodes are going to talk to that? Carrier pigeons? Those are obviously going to be talked to by means of some form of network communication as well. On-board fencing devices have an entirely different issue, and I believe that's the one you ought to be spending more thought on. Consider this: 1. You cut the power to one of the nodes. 2. Naturally, it stops responding to cluster messaging on the wire. 3. The remaining cluster nodes must now fence the non-responding node to make sure it's effectively dead. 4. So one of the remaining nodes contacts the IPMI device, and if that device reports that yes, power is out, I'm down, then all is fine and the cluster can proceed with resource failover. 5. If that IPMI device is not responding because of lack of power, though, the cluster must assume that fencing has failed and must either freeze, or fall back to an alternate fencing method. Many vendors now solve issue #5 by slapping a battery or supercapacitor on the IPMI BMC, which keeps out-of-band management available for some time even after power has failed. This, again, is something that is readily available on most current servers, and it certainly doesn't mean an outrageously large investment. I just very recently consulted on a Pacemaker cluster with IPMI-based fencing for a US customer, where we tested this by yanking the power cords from both PSUs on an active cluster node, and fencing plus resource failover worked like a charm. Hope this helps. Cheers, Florian -- Need help with Pacemaker? http://www.hastexo.com/now _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
