On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote: > > > ----- Original Message ----- >> From: "Yan Gao" <y...@suse.com> >> To: pacemaker@oss.clusterlabs.org >> Sent: Monday, January 21, 2013 11:28:40 PM >> Subject: Re: [Pacemaker] Enable remote monitoring >> >> Hi, >> Here's the code for supporting nagios plugins in lrmd: >> >> https://github.com/gao-yan/pacemaker/commits/nagios >> >> A new resource class "nagios" is introduced. >> >> Actions: >> >> - probe: A resource defined for a resource container is not probed. >> (We >> can also add a condition in pengine to just avoid probing a nagios >> class >> resource.) > > Yeah, I think the pengine should know to never probe a nagios script > regardless if it is involved in a container or not. > >> - start: Invokes the nagios plugin with specified parameters (Maps >> the >> instance attributes to the long options of the nagios plugin). If it >> returns non-OK, re-invokes it after some delay (delay = start_timeout >> / >> 10), until it returns OK or exceeds the start timeout. > > I made a comment about this on the patch. Shouldn't the cmd->timeout value > be updated each time it is re-scheduled to account for time already spent? > >> >> - monitor: Recurring invocation to the nagios plugin with specified >> parameters. >> >> - stop: Nothing special is done. The recurring monitor is canceled >> anyway. >> >> - metadata: Reads the corresponding metadata from a xml file in >> NAGIOS_METADATA_DIR. >> >> (As we know nagios plugins don't support metadata. The current plan >> is >> to generate the corresponding metadata according to the help of the >> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan >> already >> has progress on this. Thank, Dejan!) >> >> >> For nagios plugins, the exit code are: >> >> STATE_OK = 0, >> STATE_WARNING = 1, >> STATE_CRITICAL = 2, >> STATE_UNKNOWN = 3, >> STATE_DEPENDENT = 4, >> >> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should >> all >> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no >> code >> to express "NOT_RUNNING" in nagios plugins. I think it should be >> fine, >> since there's no probe. >> >> Any suggestions are appreciated! > > This mostly looks like what I expected. I'm letting the whole re-scheduling > of the start operation roll around in my head a bit. It almost seems like > that functionality belongs in the service library... retry executing this > action until either the timeout is hit or some target return code is > encountered. Any thoughts on that?
Who the what now? Why do start ops need to be rescheduled? > > -- Vossel > >> Thanks, >> Gao,Yan >> >> -- >> Gao,Yan <y...@suse.com> >> Software Engineer >> China Server Team, SUSE. >> >> * English - detected >> * English >> * Chinese (Simplified) >> >> * English >> * Chinese (Simplified) >> >> <javascript:void(0);> <#> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org