Re: [Pacemaker] Enable remote monitoring

Andrew Beekhof Wed, 30 Jan 2013 22:38:54 -0800

On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote:

> 
> 
> ----- Original Message -----
>> From: "Yan Gao" <y...@suse.com>
>> To: pacemaker@oss.clusterlabs.org
>> Sent: Monday, January 21, 2013 11:28:40 PM
>> Subject: Re: [Pacemaker] Enable remote monitoring
>> 
>> Hi,
>> Here's the code for supporting nagios plugins in lrmd:
>> 
>> https://github.com/gao-yan/pacemaker/commits/nagios
>> 
>> A new resource class "nagios" is introduced.
>> 
>> Actions:
>> 
>> - probe: A resource defined for a resource container is not probed.
>> (We
>> can also add a condition in pengine to just avoid probing a nagios
>> class
>> resource.)
> 
> Yeah, I think the pengine should know to never probe a nagios script 
> regardless if it is involved in a container or not.
> 
>> - start: Invokes the nagios plugin with specified parameters (Maps
>> the
>> instance attributes to the long options of the nagios plugin). If it
>> returns non-OK, re-invokes it after some delay (delay = start_timeout
>> /
>> 10),  until it returns OK or exceeds the start timeout.
> 
> I made a comment about this on the patch.  Shouldn't the cmd->timeout value 
> be updated each time it is re-scheduled to account for time already spent?
> 
>> 
>> - monitor: Recurring invocation to the nagios plugin with specified
>> parameters.
>> 
>> - stop: Nothing special is done. The recurring monitor is canceled
>> anyway.
>> 
>> - metadata: Reads the corresponding metadata from a xml file in
>> NAGIOS_METADATA_DIR.
>> 
>> (As we know nagios plugins don't support metadata. The current plan
>> is
>> to generate the corresponding metadata according to the help of the
>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
>> already
>> has progress on this. Thank, Dejan!)
>> 
>> 
>> For nagios plugins, the exit code are:
>> 
>> STATE_OK        = 0,
>> STATE_WARNING   = 1,
>> STATE_CRITICAL  = 2,
>> STATE_UNKNOWN   = 3,
>> STATE_DEPENDENT = 4,
>> 
>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
>> all
>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
>> code
>> to express "NOT_RUNNING" in nagios plugins. I think it should be
>> fine,
>> since there's no probe.
>> 
>> Any suggestions are appreciated!
> 
> This mostly looks like what I expected.  I'm letting the whole re-scheduling 
> of the start operation roll around in my head a bit.  It almost seems like 
> that functionality belongs in the service library...  retry executing this 
> action until either the timeout is hit or some target return code is 
> encountered.  Any thoughts on that?


Who the what now?
Why do start ops need to be rescheduled?

> 
> -- Vossel
> 
>> Thanks,
>>  Gao,Yan
>> 
>> --
>> Gao,Yan <y...@suse.com>
>> Software Engineer
>> China Server Team, SUSE.
>> 
>>  * English - detected
>>  * English
>>  * Chinese (Simplified)
>> 
>>  * English
>>  * Chinese (Simplified)
>> 
>> <javascript:void(0);> <#>
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enable remote monitoring

Reply via email to