On Fri, Feb 1, 2013 at 3:37 PM, Gao,Yan <y...@suse.com> wrote: > Hi Andrew, > > On 01/31/13 14:35, Andrew Beekhof wrote: >> >> On 24/01/2013, at 3:36 AM, David Vossel <dvos...@redhat.com> wrote: >> >>> >>> >>> ----- Original Message ----- >>>> From: "Yan Gao" <y...@suse.com> >>>> To: pacemaker@oss.clusterlabs.org >>>> Sent: Monday, January 21, 2013 11:28:40 PM >>>> Subject: Re: [Pacemaker] Enable remote monitoring >>>> >>>> Hi, >>>> Here's the code for supporting nagios plugins in lrmd: >>>> >>>> https://github.com/gao-yan/pacemaker/commits/nagios >>>> >>>> A new resource class "nagios" is introduced. >>>> >>>> Actions: >>>> >>>> - probe: A resource defined for a resource container is not probed. >>>> (We >>>> can also add a condition in pengine to just avoid probing a nagios >>>> class >>>> resource.) >>> >>> Yeah, I think the pengine should know to never probe a nagios script >>> regardless if it is involved in a container or not. >>> >>>> - start: Invokes the nagios plugin with specified parameters (Maps >>>> the >>>> instance attributes to the long options of the nagios plugin). If it >>>> returns non-OK, re-invokes it after some delay (delay = start_timeout >>>> / >>>> 10), until it returns OK or exceeds the start timeout. >>> >>> I made a comment about this on the patch. Shouldn't the cmd->timeout value >>> be updated each time it is re-scheduled to account for time already spent? >>> >>>> >>>> - monitor: Recurring invocation to the nagios plugin with specified >>>> parameters. >>>> >>>> - stop: Nothing special is done. The recurring monitor is canceled >>>> anyway. >>>> >>>> - metadata: Reads the corresponding metadata from a xml file in >>>> NAGIOS_METADATA_DIR. >>>> >>>> (As we know nagios plugins don't support metadata. The current plan >>>> is >>>> to generate the corresponding metadata according to the help of the >>>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan >>>> already >>>> has progress on this. Thank, Dejan!) >>>> >>>> >>>> For nagios plugins, the exit code are: >>>> >>>> STATE_OK = 0, >>>> STATE_WARNING = 1, >>>> STATE_CRITICAL = 2, >>>> STATE_UNKNOWN = 3, >>>> STATE_DEPENDENT = 4, >>>> >>>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should >>>> all >>>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no >>>> code >>>> to express "NOT_RUNNING" in nagios plugins. I think it should be >>>> fine, >>>> since there's no probe. >>>> >>>> Any suggestions are appreciated! >>> >>> This mostly looks like what I expected. I'm letting the whole >>> re-scheduling of the start operation roll around in my head a bit. It >>> almost seems like that functionality belongs in the service library... >>> retry executing this action until either the timeout is hit or some target >>> return code is encountered. Any thoughts on that? >> >> Who the what now? >> Why do start ops need to be rescheduled? > It's very likely that the "start" of the container returns before the > services inside are started. Abusing start-delay is not preferred. The > idea is, in the start operation of the nagios resource, repeatedly > monitoring the service until it returns OK or exceeds the start timeout.
I thought both stop and start were a no-op and only monitor did anything? Did we move on from that (I can see why we might, my memory is just a little hazy on the subject)? > > The latest code for supporting nagios plugin in lrmd is in: > https://github.com/gao-yan/pacemaker/commits/nagios > > And the code for supporting container in policy engine is still in: > https://github.com/ClusterLabs/pacemaker/pull/195 Top of my list. Firing up web browser now... _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org