On 4/27/10 2:15 PM, "Justin Lloyd" <jll...@digitalglobe.com> wrote:
> This is a follow-on to my original thread about creating a Solaris SMF
> service. Since I'm no longer doing that and have decided to let Zenoss
> do that, I was curious about what others are doing along these lines. A
> couple of things came to mind as I was mulling over how to do this.
> 
> Using SNMP to monitor cf-execd processes is probably the best way,
> except for the caveat about which I just learned that the SNMP MIB ends
> with the processes' PIDs and Cfengine by default restarts itself at 5
> AM, which would lead to unnecessary alerts. Is that restart necessary
> and, if so, what's a good way to handle monitoring cf-execd?

I'm sure it is common knowledge...  But if you do want to use SNMP, I've had
better luck with this sort of thing when I bundle the heavy "is it OK" logic
in a local script which runs and dumps state to some /tmp/file, then use
snmpd.conf extends to cat the file (always returns quickly).

> Zenoss could also restart cf-execd if it's been down for some specified
> amount of time. 

Event handlers all the way.

> Also, I figured I should probably have Zenoss also monitor cf-serverd
> and cf-monitord, even though Cfengine already monitors and will restart
> them. My thought here was in case something really gets broken and
> either or both of those two do not start up correctly. So I figured
> maybe only alert if they're down for more than 20 or 30 seconds. Anyone
> dealing with this?

That's exactly what we do.  We actually have daemon tools restart cf daemons
on our servers if they die, and monitoring also watches for missing
processes (something went really wrong) but has an extra-long-threshold so
people aren't paged during "normal" or at least self-healing restarts.

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to