On Fri, Dec 7, 2012 at 3:19 PM, Gao,Yan <y...@suse.com> wrote: > On 12/07/12 12:09, Andrew Beekhof wrote: >> On Fri, Dec 7, 2012 at 3:00 PM, Gao,Yan <y...@suse.com> wrote: >>> On 12/07/12 07:38, Andrew Beekhof wrote: >>>> >>>> On 06/12/2012, at 10:42 PM, Lars Marowsky-Bree <l...@suse.com> wrote: >>>> >>>>> On 2012-12-06T22:25:40, Andrew Beekhof <and...@beekhof.net> wrote: >>>>> >>>>>> But any failures of the nagios agents would count against the VM's >>>>>> migration-threshold. >>>>>> So if moving were the right thing to do, it would have done it already. >>>>> >>>>> OK. I think this was due to me still being stuck on the workings of an >>>>> order constraint, but of course if the failures are instead attributed >>>>> to the container, this would happen automatically already. True. >>>>> >>>>> (Incidentally, I like "attribute", "ascribe" better than "delegate" >>>>> because to me, they better fit what's going on, if we sticked with >>>>> "delegate-failures". Just saying. ;-) >>>> >>>> My use of "delegate" comes from my time with ObjectiveC where its common >>>> practice to use them for "I'm not going to handle X but here is something >>>> that does" style functionality. >>>> Which fits nicely with what we're doing here. >>>> >>>> container="vm" also works though. >>>> >>>>> >>>>>>> We already have on-fail settings. How would these play together? >>>>>> Good question. My initial thought was that it would be up to on-fail >>>>>> settings in the VM. >>>>> >>>>> I'd prefer to keep that separate (as proposed below). Because if an >>>>> action of the *VM* really fails, I may want an admin to look into it >>>>> (why could the bloody hypervisor not start/stop it?), which is different >>>>> from restarting the VM if one of the resources within it needs that. >>>>> >>>>>>> Would it even make sense to have on-fail="restart-container"? (Or a >>>>>>> nicer wording.) >>>>>>> >>>>>>> Hmmm. That might work. We allow a "container" to be specified as a meta >>>>>>> attribute. >>>>>>> >>>>>>> If set, on-fail would default to restart container for most actions. But >>>>>>> admins could actually modify it - say, they might want to set >>>>>>> monitor on-fail="ignore" to just get notified. And when we move forward >>>>>>> to whiteboxes, we could have start/monitor/promote/demote >>>>>>> on-fail="restart" (like now) and stop on-fail="restart-container". >>>>>>> >>>>>>> That appears reasonably neat? >>>>>> It does actually. >>>>>> I wasn't originally thinking it was necessary but it makes sense now >>>>>> that you point it out. >>>>> >>>>> Yes, I think I like this too now. >>>>> >>>>> Uhm. Would "container" imply ordering + colocation, or would we still >>>>> need them grouped (resource_set'ed, whatever)? >>>> >>>> Ordering: absolutely >>> Would any user not like the implied order? Instead want an asymmetrical >>> or some curious one? >> >> Conceptually it doesn't make any sense IMHO. >> By definition things cant be in/on the container if the container >> doesn't exist yet. > Right. > >> >> The one thing we've not addressed yet is probing, thats going to be fun :) > I guess there should be some way for the nagios RAs to return > NOT_RUNNING if there's nothing yet, no?
Right, but its talking to an IP address. Once the guest is up it can be seen from all the nodes, a reprobe would make it appear to be active _everywhere_. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org