On 02/05/13 16:29, Andrew Beekhof wrote:
> On Fri, Feb 1, 2013 at 3:37 PM, Gao,Yan <y...@suse.com> wrote:
>> Hi Andrew,
>>
>> On 01/31/13 14:35, Andrew Beekhof wrote:
>>>
>>> On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Yan Gao" <y...@suse.com>
>>>>> To: pacemaker@oss.clusterlabs.org
>>>>> Sent: Monday, January 21, 2013 11:28:40 PM
>>>>> Subject: Re: [Pacemaker] Enable remote monitoring
>>>>>
>>>>> Hi,
>>>>> Here's the code for supporting nagios plugins in lrmd:
>>>>>
>>>>> https://github.com/gao-yan/pacemaker/commits/nagios
>>>>>
>>>>> A new resource class "nagios" is introduced.
>>>>>
>>>>> Actions:
>>>>>
>>>>> - probe: A resource defined for a resource container is not probed.
>>>>> (We
>>>>> can also add a condition in pengine to just avoid probing a nagios
>>>>> class
>>>>> resource.)
>>>>
>>>> Yeah, I think the pengine should know to never probe a nagios script 
>>>> regardless if it is involved in a container or not.
>>>>
>>>>> - start: Invokes the nagios plugin with specified parameters (Maps
>>>>> the
>>>>> instance attributes to the long options of the nagios plugin). If it
>>>>> returns non-OK, re-invokes it after some delay (delay = start_timeout
>>>>> /
>>>>> 10),  until it returns OK or exceeds the start timeout.
>>>>
>>>> I made a comment about this on the patch.  Shouldn't the cmd->timeout 
>>>> value be updated each time it is re-scheduled to account for time already 
>>>> spent?
>>>>
>>>>>
>>>>> - monitor: Recurring invocation to the nagios plugin with specified
>>>>> parameters.
>>>>>
>>>>> - stop: Nothing special is done. The recurring monitor is canceled
>>>>> anyway.
>>>>>
>>>>> - metadata: Reads the corresponding metadata from a xml file in
>>>>> NAGIOS_METADATA_DIR.
>>>>>
>>>>> (As we know nagios plugins don't support metadata. The current plan
>>>>> is
>>>>> to generate the corresponding metadata according to the help of the
>>>>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
>>>>> already
>>>>> has progress on this. Thank, Dejan!)
>>>>>
>>>>>
>>>>> For nagios plugins, the exit code are:
>>>>>
>>>>> STATE_OK        = 0,
>>>>> STATE_WARNING   = 1,
>>>>> STATE_CRITICAL  = 2,
>>>>> STATE_UNKNOWN   = 3,
>>>>> STATE_DEPENDENT = 4,
>>>>>
>>>>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
>>>>> all
>>>>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
>>>>> code
>>>>> to express "NOT_RUNNING" in nagios plugins. I think it should be
>>>>> fine,
>>>>> since there's no probe.
>>>>>
>>>>> Any suggestions are appreciated!
>>>>
>>>> This mostly looks like what I expected.  I'm letting the whole 
>>>> re-scheduling of the start operation roll around in my head a bit.  It 
>>>> almost seems like that functionality belongs in the service library...  
>>>> retry executing this action until either the timeout is hit or some target 
>>>> return code is encountered.  Any thoughts on that?
>>>
>>> Who the what now?
>>> Why do start ops need to be rescheduled?
>> It's very likely that the "start" of the container returns before the
>> services inside are started. Abusing start-delay is not preferred. The
>> idea is, in the start operation of the nagios resource, repeatedly
>> monitoring the service until it returns OK or exceeds the start timeout.
> 
> I thought both stop and start were a no-op and only monitor did anything?
> Did we move on from that (I can see why we might, my memory is just a
> little hazy on the subject)?
AFAICT, doing that for start op can avoid unnecessary increments of
fail-count during the time window. Yes, stop is a no-op actually, since
existing monitor will be canceled anyway.

> 
>>
>> The latest code for supporting nagios plugin in lrmd is in:
>> https://github.com/gao-yan/pacemaker/commits/nagios
>>
>> And the code for supporting container in policy engine is still in:
>> https://github.com/ClusterLabs/pacemaker/pull/195
> 
> Top of my list. Firing up web browser now...
Thanks!

Regards,
  Gao,Yan
-- 
Gao,Yan <y...@suse.com>
Software Engineer
China Server Team, SUSE.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to