On Fri, Feb 1, 2013 at 3:37 PM, Gao,Yan <y...@suse.com> wrote:
> Hi Andrew,
>
> On 01/31/13 14:35, Andrew Beekhof wrote:
>>
>> On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote:
>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Yan Gao" <y...@suse.com>
>>>> To: pacemaker@oss.clusterlabs.org
>>>> Sent: Monday, January 21, 2013 11:28:40 PM
>>>> Subject: Re: [Pacemaker] Enable remote monitoring
>>>>
>>>> Hi,
>>>> Here's the code for supporting nagios plugins in lrmd:
>>>>
>>>> https://github.com/gao-yan/pacemaker/commits/nagios
>>>>
>>>> A new resource class "nagios" is introduced.
>>>>
>>>> Actions:
>>>>
>>>> - probe: A resource defined for a resource container is not probed.
>>>> (We
>>>> can also add a condition in pengine to just avoid probing a nagios
>>>> class
>>>> resource.)
>>>
>>> Yeah, I think the pengine should know to never probe a nagios script 
>>> regardless if it is involved in a container or not.
>>>
>>>> - start: Invokes the nagios plugin with specified parameters (Maps
>>>> the
>>>> instance attributes to the long options of the nagios plugin). If it
>>>> returns non-OK, re-invokes it after some delay (delay = start_timeout
>>>> /
>>>> 10),  until it returns OK or exceeds the start timeout.
>>>
>>> I made a comment about this on the patch.  Shouldn't the cmd->timeout value 
>>> be updated each time it is re-scheduled to account for time already spent?
>>>
>>>>
>>>> - monitor: Recurring invocation to the nagios plugin with specified
>>>> parameters.
>>>>
>>>> - stop: Nothing special is done. The recurring monitor is canceled
>>>> anyway.
>>>>
>>>> - metadata: Reads the corresponding metadata from a xml file in
>>>> NAGIOS_METADATA_DIR.
>>>>
>>>> (As we know nagios plugins don't support metadata. The current plan
>>>> is
>>>> to generate the corresponding metadata according to the help of the
>>>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
>>>> already
>>>> has progress on this. Thank, Dejan!)
>>>>
>>>>
>>>> For nagios plugins, the exit code are:
>>>>
>>>> STATE_OK        = 0,
>>>> STATE_WARNING   = 1,
>>>> STATE_CRITICAL  = 2,
>>>> STATE_UNKNOWN   = 3,
>>>> STATE_DEPENDENT = 4,
>>>>
>>>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
>>>> all
>>>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
>>>> code
>>>> to express "NOT_RUNNING" in nagios plugins. I think it should be
>>>> fine,
>>>> since there's no probe.
>>>>
>>>> Any suggestions are appreciated!
>>>
>>> This mostly looks like what I expected.  I'm letting the whole 
>>> re-scheduling of the start operation roll around in my head a bit.  It 
>>> almost seems like that functionality belongs in the service library...  
>>> retry executing this action until either the timeout is hit or some target 
>>> return code is encountered.  Any thoughts on that?
>>
>> Who the what now?
>> Why do start ops need to be rescheduled?
> It's very likely that the "start" of the container returns before the
> services inside are started. Abusing start-delay is not preferred. The
> idea is, in the start operation of the nagios resource, repeatedly
> monitoring the service until it returns OK or exceeds the start timeout.

I thought both stop and start were a no-op and only monitor did anything?
Did we move on from that (I can see why we might, my memory is just a
little hazy on the subject)?

>
> The latest code for supporting nagios plugin in lrmd is in:
> https://github.com/gao-yan/pacemaker/commits/nagios
>
> And the code for supporting container in policy engine is still in:
> https://github.com/ClusterLabs/pacemaker/pull/195

Top of my list. Firing up web browser now...

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to