On 02/05/13 16:29, Andrew Beekhof wrote: > On Fri, Feb 1, 2013 at 3:37 PM, Gao,Yan <y...@suse.com> wrote: >> Hi Andrew, >> >> On 01/31/13 14:35, Andrew Beekhof wrote: >>> >>> On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote: >>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "Yan Gao" <y...@suse.com> >>>>> To: pacemaker@oss.clusterlabs.org >>>>> Sent: Monday, January 21, 2013 11:28:40 PM >>>>> Subject: Re: [Pacemaker] Enable remote monitoring >>>>> >>>>> Hi, >>>>> Here's the code for supporting nagios plugins in lrmd: >>>>> >>>>> https://github.com/gao-yan/pacemaker/commits/nagios >>>>> >>>>> A new resource class "nagios" is introduced. >>>>> >>>>> Actions: >>>>> >>>>> - probe: A resource defined for a resource container is not probed. >>>>> (We >>>>> can also add a condition in pengine to just avoid probing a nagios >>>>> class >>>>> resource.) >>>> >>>> Yeah, I think the pengine should know to never probe a nagios script >>>> regardless if it is involved in a container or not. >>>> >>>>> - start: Invokes the nagios plugin with specified parameters (Maps >>>>> the >>>>> instance attributes to the long options of the nagios plugin). If it >>>>> returns non-OK, re-invokes it after some delay (delay = start_timeout >>>>> / >>>>> 10), until it returns OK or exceeds the start timeout. >>>> >>>> I made a comment about this on the patch. Shouldn't the cmd->timeout >>>> value be updated each time it is re-scheduled to account for time already >>>> spent? >>>> >>>>> >>>>> - monitor: Recurring invocation to the nagios plugin with specified >>>>> parameters. >>>>> >>>>> - stop: Nothing special is done. The recurring monitor is canceled >>>>> anyway. >>>>> >>>>> - metadata: Reads the corresponding metadata from a xml file in >>>>> NAGIOS_METADATA_DIR. >>>>> >>>>> (As we know nagios plugins don't support metadata. The current plan >>>>> is >>>>> to generate the corresponding metadata according to the help of the >>>>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan >>>>> already >>>>> has progress on this. Thank, Dejan!) >>>>> >>>>> >>>>> For nagios plugins, the exit code are: >>>>> >>>>> STATE_OK = 0, >>>>> STATE_WARNING = 1, >>>>> STATE_CRITICAL = 2, >>>>> STATE_UNKNOWN = 3, >>>>> STATE_DEPENDENT = 4, >>>>> >>>>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should >>>>> all >>>>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no >>>>> code >>>>> to express "NOT_RUNNING" in nagios plugins. I think it should be >>>>> fine, >>>>> since there's no probe. >>>>> >>>>> Any suggestions are appreciated! >>>> >>>> This mostly looks like what I expected. I'm letting the whole >>>> re-scheduling of the start operation roll around in my head a bit. It >>>> almost seems like that functionality belongs in the service library... >>>> retry executing this action until either the timeout is hit or some target >>>> return code is encountered. Any thoughts on that? >>> >>> Who the what now? >>> Why do start ops need to be rescheduled? >> It's very likely that the "start" of the container returns before the >> services inside are started. Abusing start-delay is not preferred. The >> idea is, in the start operation of the nagios resource, repeatedly >> monitoring the service until it returns OK or exceeds the start timeout. > > I thought both stop and start were a no-op and only monitor did anything? > Did we move on from that (I can see why we might, my memory is just a > little hazy on the subject)? AFAICT, doing that for start op can avoid unnecessary increments of fail-count during the time window. Yes, stop is a no-op actually, since existing monitor will be canceled anyway.
> >> >> The latest code for supporting nagios plugin in lrmd is in: >> https://github.com/gao-yan/pacemaker/commits/nagios >> >> And the code for supporting container in policy engine is still in: >> https://github.com/ClusterLabs/pacemaker/pull/195 > > Top of my list. Firing up web browser now... Thanks! Regards, Gao,Yan -- Gao,Yan <y...@suse.com> Software Engineer China Server Team, SUSE. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org