Re: [Pacemaker] Enable remote monitoring

Andrew Beekhof Thu, 06 Dec 2012 15:44:41 -0800

On 07/12/2012, at 10:19 AM, David Vossel <dvos...@redhat.com> wrote:


> ----- Original Message -----
>> From: "Yan Gao" <y...@suse.com>
>> To: pacemaker@oss.clusterlabs.org
>> Sent: Thursday, December 6, 2012 12:28:06 PM
>> Subject: Re: [Pacemaker] Enable remote monitoring
>> 
>> Hi,
>> 
>> On 12/06/12 19:42, Lars Marowsky-Bree wrote:
>>> On 2012-12-06T22:25:40, Andrew Beekhof <and...@beekhof.net> wrote:
>>> 
>>>> But any failures of the nagios agents would count against the VM's
>>>> migration-threshold.
>>>> So if moving were the right thing to do, it would have done it
>>>> already.
>>> 
>>> OK. I think this was due to me still being stuck on the workings of
>>> an
>>> order constraint, but of course if the failures are instead
>>> attributed
>>> to the container, this would happen automatically already. True.
>>> 
>>> (Incidentally, I like "attribute", "ascribe" better than "delegate"
>>> because to me, they better fit what's going on, if we sticked with
>>> "delegate-failures". Just saying. ;-)
>>> 
>>>>> We already have on-fail settings. How would these play together?
>>>> Good question. My initial thought was that it would be up to
>>>> on-fail
>>>> settings in the VM.
>>> 
>>> I'd prefer to keep that separate (as proposed below). Because if an
>>> action of the *VM* really fails, I may want an admin to look into
>>> it
>>> (why could the bloody hypervisor not start/stop it?), which is
>>> different
>>> from restarting the VM if one of the resources within it needs
>>> that.
>>> 
>>>>> Would it even make sense to have on-fail="restart-container"? (Or
>>>>> a
>>>>> nicer wording.)
>>>>> 
>>>>> Hmmm. That might work. We allow a "container" to be specified as
>>>>> a meta
>>>>> attribute.
>>>>> 
>>>>> If set, on-fail would default to restart container for most
>>>>> actions. But
>>>>> admins could actually modify it - say, they might want to set
>>>>> monitor on-fail="ignore" to just get notified. And when we move
>>>>> forward
>>>>> to whiteboxes, we could have start/monitor/promote/demote
>>>>> on-fail="restart" (like now) and stop
>>>>> on-fail="restart-container".
>>>>> 
>>>>> That appears reasonably neat?
>>>> It does actually.
>>>> I wasn't originally thinking it was necessary but it makes sense
>>>> now
>>>> that you point it out.
>>> 
>>> Yes, I think I like this too now.
>> I like it too. Here comes the drafted code:
>> https://github.com/gao-yan/pacemaker/commit/4f7b80baa42f3801c1fb8186aef076877f34dfea
>> 
>> It works in my simple test. Although failures of resources hasn't
>> counted against container's migration-threshold yet, it shows you the
>> basic idea. I'd appreciate if you can take a look first. It's very
>> likely I'm really on the right track this time. ;-)
> 
> I've thought about your implementation some more.  Have we discussed the 
> possibility of implicitly setting the order constraint internally when the 
> container attribute is set?  Also, it seems like now that we are mapping a 
> resource to a container resource in the meta-attributes, we could find a 
> shortcut to build the colocation relationship there as well.
> 
> What about something like this for the meta-attributes.
> 
> container="vm"  --- Internally this means 'on-fail=restart-container' and 
> 'order start vm then start rsc'
> with-container="true"  --- this means if container is set, go ahead and 
> colocate this rsc with the container.

what about:
    container-type=(black | white)

black: colocate with the vm
white: potentially other colocation or location constraints

> 
> With something like the above, we can fully express the container and child 
> relationship without multiple (any) resource and colocation constraint sets.
> 
> Anyway, just an idea... I drastically like this container meta-attribute idea 
> and the failure-delagate idea over the restart-origin one now.  
> restart-origin seemed good at first, but it doesn't really express what we 
> are doing completely, these other ideas seem represent the relationship 
> between the resources better.  Great discussion everyone :)
> 
> -- Vossel
> 
> 
> 
>>> 
>>> Uhm. Would "container" imply ordering + colocation, or would we
>>> still
>>> need them grouped (resource_set'ed, whatever)?

Grouping might still be a good idea regardless, just so you can set the 
container field once.

>>> 
>>> My, design is hard. ;-)
>> :-)
>> 
>> 
>> Regards,
>>  Gao,Yan
>> --
>> Gao,Yan <y...@suse.com>
>> Software Engineer
>> China Server Team, SUSE.
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enable remote monitoring

Reply via email to