On 01/07/2013, at 5:17 PM, Florian Crouzat <gen...@floriancrouzat.net> wrote:
> Le 29/06/2013 01:22, Andrew Beekhof a écrit : >> >> On 29/06/2013, at 12:22 AM, Digimer <li...@alteeve.ca> wrote: >> >>> On 06/28/2013 06:21 AM, Andrew Beekhof wrote: >>>> >>>> On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree <l...@suse.com> wrote: >>>> >>>>> On 2013-06-27T12:53:01, Digimer <li...@alteeve.ca> wrote: >>>>> >>>>>> primitive fence_n01_psu1_off stonith:fence_apc_snmp \ >>>>>> params ipaddr="an-p01" pcmk_reboot_action="off" port="1" >>>>>> pcmk_host_list="an-c03n01.alteeve.ca" >>>>>> primitive fence_n01_psu1_on stonith:fence_apc_snmp \ >>>>>> params ipaddr="an-p01" pcmk_reboot_action="on" port="1" >>>>>> pcmk_host_list="an-c03n01.alteeve.ca" >>>>> >>>>> So every device twice, including location constraints? I see potential >>>>> for optimization by improving how the fence code handles this ... That's >>>>> abhorrently complex. (And I'm not sure the 'action' parameter ought to >>>>> be overwritten.) >>>> >>>> I'm not crazy about it either because it means the device is tied to a >>>> specific command. >>>> But it seems to be something all the RHCS people try to do... >>> >>> Maybe something in the rhcs water cooler made us all mad... ;) >>> >>>>> Glad you got it working, though. >>>>> >>>>>> location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca >>>>> [...] >>>>> >>>>> I'm not sure you need any of these location constraints, by the way. Did >>>>> you test if it works without them? >>>>> >>>>>> Again, this is after just one test. I will want to test it several more >>>>>> times before I consider it reliable. Ideally, I would love to hear >>>>>> Andrew or others confirm this looks sane/correct. >>>>> >>>>> It looks correct, but not quite sane. ;-) That seems not to be >>>>> something you can address, though. I'm thinking that fencing topology >>>>> should be smart enough to, if multiple fencing devices are specified, to >>>>> know how to expand them to "first all off (if off fails anywhere, it's a >>>>> failure), then all on (if on fails, it is not a failure)". That'd >>>>> greatly simplify the syntax. >>>> >>>> The RH agents have apparently already been updated to support multiple >>>> ports. >>>> I'm really not keen on having the stonith-ng doing this. >>> >>> This doesn't help people who have dual power rails/PDUs for power >>> redundancy. >> >> I'm yet to be convinced that having two PDUs is helping those people in the >> first place. >> If it were actually useful, I suspect more than two/three people would have >> asked for it in the last decade. > > Well, it's probably because many people are still toying around with > pacemaker and I assume that not many advanced RHCS users have yet tried to > translate their current RHCS cluster to pacemaker. Digimer and I did, and we > both failed having the equivalent <device> configuration we had in our RHCS > setup. Yes, but RHEL isn't the only Enterprise distro out there. Its not like Pacemaker has never been deployed in critical environments during the last decade. German Air Traffic Control (http://www.novell.com/success/dfs.html) for example. Will planes fall out of the sky if your cluster fails? > > I suspect more and more people will hit this issue soon or later. > > Anyway, whatever will follow in terms of configuration primitive or API, > thanks to Digimer tests we now have something (even if unelegant) working :) > > > -- > Cheers, > Florian Crouzat > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org