On 06/27/2013 10:52 AM, Dejan Muhamedagic wrote: > On Thu, Jun 27, 2013 at 09:54:13AM -0400, Digimer wrote: >> On 06/27/2013 07:02 AM, Dejan Muhamedagic wrote: >>> Hi, >>> >>> On Wed, Jun 26, 2013 at 03:52:00PM -0400, Digimer wrote: >>>> This question appears to be the same issue asked here: >>>> >>>> http://oss.clusterlabs.org/pipermail/pacemaker/2013-June/018650.html >>>> >>>> In my case, I have two fence methods per node; IPMI first with >>>> action="reboot" and, if that fails, two PDUs (one backing each side of >>>> the node's redundant PSUs). >>>> >>>> Initially I setup the PDUs as action "reboot" figuring that the >>>> fence_toplogy tied them together, so pacemaker would call "pdu1:port1; >>>> off -> pdu2:port1; off; (verify both are off) -> pdu1:port1; on -> >>>> pdu2:port1; on". >>>> >>>> This didn't happen though. It called 'pdu1:port1; reboot' then >>>> "pdu2:port1; reboot", so the first PSU in the node had it's power back >>>> before the second PSU lost power, meaning the node never powered off. >>> >>> I'm not sure if that's supported. >> >> Unless I am misunderstood, beekhof indicated that it is/should be. > > I'm pretty sure that it's not, but perhaps things changed in the > meantime. At least it wasn't when we discussed the > implementation. > >>>> So next I tried; >>>> >>>> pdu1:port1; off -> pdu2:port1; off -> pdu1:port1; on -> pdu1:port1; on >>>> >>>> However, this seemed to have actually done; >>>> >>>> pdu1:port1; reboot -> pdu2:port1; reboot -> pdu1:port1; reboot -> >>>> pdu1:port1; reboot >>>> >>>> So again, the node never lost power to both PSUs at the same time, so >>>> the node didn't power off. >>>> >>>> This makes PDU fencing unreliable. I know beekhof said: >>>> >>>> "My point would be that action=off is not the correct way to configure >>>> what you're trying to do." >>>> >>>> in the other thread, but there was no elaborating on what *is* the right >>>> way. So if neither approach works, what is the proper way for configure >>>> PDU fencing when you have two different PDUs backing either PSU? >>> >>> The fence action needs to be defined in the cluster properties >>> (crm_config/cluster_property_set in XML): >>> >>> # crm configure property stonith-action=off >>> >>> See the output of: >>> >>> $ crm ra info pengine >>> >>> for the PE metadata and explanation of properties. >> >> In irc last night, beekhof mentioned that action="..." is ignored and >> replaced. However, it would appear that pcmk_reboot_action="..." should >> force the issue. I'm planning to test this today. > > Yes, true, though it's a bit of a kludge > (pcmk_reboot_action="off" if I got that right). > >>>> I don't want to disable "reboot" globally because I still want the >>>> IPMI based fencing to do action="reboot". >>> >>> I don't think it is possible to define a per-resource fencing >>> action. >>> >>>> If I just do "off", then the >>>> node will not power back on after a successful fence. This is better >>>> than nothing, but still quite sub-optimal. >>> >>> Yes, if you want to start the cluster stack automatically on >>> reboot. Anyway, I think that it would be preferred to let a human >>> check why the node got fenced before letting it join the cluster >>> again. In that case, one just needs to boot the host manually. >>> >>> Thanks, >>> >>> Dejan >> >> I don't want the cluster stack to start on boot, so I disable >> pacemaker/corosync. However, I do want the node to power back on so that >> I can log into it when the alarms go off. Yes, I could log into the good >> node, manually unfence/boot it and then log in, but this adds minutes to >> the MTTR that I would realllly like to avoid. > > Certainly it adds a bit of time, but only to the node's MTTR, > not the cluster's MTTR. Anyway, if pacemaker can turn off the > node, then a short script can also turn it on. > > Cheers, > > Dejan
If I need to write a script, I will instead write a new fence agent that handles multiple PDUs in a sensible fashion. I'm already thinking of "fence_apc_multi" that takes a string of addresses and ports and does a clean "off" on all, verifies all are off, then an "on" on all. This would make the pacemaker config a lot simpler and cleaner and allow "reboot" to remain the default action. However, this feels like a really bad solution. It's not uncommon to have two separate power rails feeding either side of the node's PSUs. Particularly in HA environments. RHCS has supported this for a very long time and I expect many users will run into this problem as they try to migrate to RHEL 7. I see no reason why this can't be properly handled in pacemaker directly. I'm hoping it is and I am just too new to pacemaker to realize my mistake. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org