On 07/01/2013 12:43 PM, Lars Marowsky-Bree wrote: > On 2013-07-01T11:53:29, Digimer <li...@alteeve.ca> wrote: > >> You are right, of course. Imagine though that the IPMI BMC's network >> port or cable could have silently failed some time before the node >> failed. > > Pacemaker can monitor the fencing device if you configure a monitor > action for it, for exactly this reason.
My *very* initial testing of op monitor="30" didn't detect the failure or recovery of the fence device. I may very well have screwed something up though... I still have a lot to learn. As an aside, RHEL 6.4 introduce 'fence_check' which will do the same if you cron/script it. >> Yes, this is two simultaneous failues so not an overall SPoF, but >> likely enough that it should be addressed. > > Yes ;-) > > While it's conceivable that the *fencing* network switch doesn't have a > dual power supply and thus is affected by the outage (and very very few > management boards have two network ports so that you could connect them > to two), the answer here could be to - at least for two node scenarios - > just connect the management ports to a dedicated NIC on the other node. > (A ring topology for multiple nodes is conceivable.) > > Then a single power failure could well cause both methods to fail. > > Still, it's a double failure that we, officially, don't protect against > in all scenarios (the power failure + whatever causes the fence). I protect against this scenario by using two switches and plugging IPMI into the first switch and the PDUs into the second switch. All nodes use bonded links with a leg in either switch. So the failure of an entire switch will not cause an interruption or the loss of fencing capabilities. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org