We'll probably use Zenoss to ensure the proper Cfengine processes are running, though maybe just the cf-execd processes since those in turn can watch cf-serverd and cf-monitord. (How I wish Linux had something like Solaris' Service Management Facility (SMF).) Along those same lines, we'll make use of Cfengine's lastseen feature to report on systems that haven't "checked-in" to the policy server in more than some defined amount of time.
I agree that failed repairs are more important to know about more quickly than successful repairs. It would just be nice to have some configurable mechanism for detecting such things as quickly as reasonably possible. Justin From: Michael Potter [mailto:mega...@gmail.com] Sent: Tuesday, February 09, 2010 2:30 PM To: Justin Lloyd Cc: Neil Watson; help-cfengine@cfengine.org Subject: Re: Email notification of repairs On Wed, Feb 10, 2010 at 8:09 AM, Justin Lloyd <jll...@digitalglobe.com> wrote: Has anyone done any investigation into having a monitoring tool like Zenoss (which we use), Nagios, or OpenNMS watch for repairs? I use Nagios to capture the tail of promises.log. This is mainly to verify that cfengine is in fact running (Nagios will alert if promises.log does not contain an entry within the last N minutes). This does not show what promises are being repaired however, but IMHO this is a good thing, getting all obsessive about what repairs are occurring was one of my earlier mistakes with cfengine - just let it run and be comfortable in the knowledge that your system is in the desired state. For repairs that *fail* on the other hand, it would be extremely nice to have some way to capture that information, as it essentially means cfengine tried to bring the system to a desired state but was unable to, a situation that would probably call for urgent manual intervention. At the very least, centralizing at least some of Cfengine hosts' logs and using a log-watching tool like Swatch or Splunk would be a step in the right direction. Team Cfengine: Is there any kind of roadmap for integration with such third-party monitoring tools? Thanks, Justin -----Original Message----- From: help-cfengine-boun...@cfengine.org [mailto:help-cfengine-boun...@cfengine.org] On Behalf Of Neil Watson Sent: Tuesday, February 09, 2010 12:56 PM To: help-cfengine@cfengine.org Subject: Re: Email notification of repairs The trouble with this type of raw email notification is lack of correlation and reliability. If the MTA is out of action you'll get no notice. If the agent attempts repeated repairs repeated emails are sent. This can be very disheartening. As has been mentioned a monitoring and alerting system would be better for this. Something like OpenNMS can correlate events into a single alarm and escalate notification while avoiding the information storm than can come from 'dumb' notification services. -- Neil Watson Linux/UNIX Consultant http://watson-wilson.ca _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine This electronic communication and any attachments may contain confidential and proprietary information of DigitalGlobe, Inc. If you are not the intended recipient, or an agent or employee responsible for delivering this communication to the intended recipient, or if you have received this communication in error, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this communication in error, and delete the copy you received. DigitalGlobe reserves the right to monitor any electronic communication sent or received by its employees, agents or representatives. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine
_______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine