I appreciate the thoughtful responses. To restate one of my original thoughts on this, in general Cfengine shouldn't be making a lot of repairs, because a repair means something went wrong, for which the underlying cause should be prioritized and investigated. There are of course certain kinds of promises that will likely be repaired often and by many machines, such as pulling down the latest copies of certain master files (e.g. a site-wide sudoers configuration), which is why flexibility in being able to enable or disable on a per-promise or per-bundle basis could also be valuable.
Cfengine does already log promise repairs, both in a log file and to the policy server's web server. I'm just brainstorming ways of setting up the environment to facilitate faster response time to budding issues. Plugging in to various types of system and log monitoring tools like Zenoss, Nagios, Splunk, etc. would certainly be another approach. IMHO, too many emails isn't the problem. Too much information is. J Moving stuff to logs, reports, tweets, etc. just shuffles the information around (squeezing the balloon), but you're still dealing with the same amount of information. Use information well to reduce the amount of new information generated. Justin From: help-cfengine-boun...@cfengine.org [mailto:help-cfengine-boun...@cfengine.org] On Behalf Of Michael Potter Sent: Monday, February 08, 2010 2:23 PM To: Deb Heller-Evans Cc: help-cfengine@cfengine.org Subject: Re: Email notification of repairs 2010/2/9 Deb Heller-Evans <d...@es.net> Justin, I see where you're going here - that you want to alert on unkept promises. But I am sure that like many here on this list, I receive hundreds if not thousands of emails per day that are already filtered and sorted with often times more information than I can or want to process. Alternatively, you could log the condition to a file, rather than send an alert email, and some parsing function could periodically alert you to the negative status. What you're describing might be a good tickler for a Nagios alert condition. We've found the alert mechanisms in Nagios to scale well over hundreds of systems, without the necessity of email floods. Haven't yet coupled Nagios with Cfengine, but it's on my horizon. Kind Regards, deb Deb Heller-Evans 1 Cyclotron Road Computer Systems Engineer Berkeley, CA 94720 ESnet http://www.es.net/ Desk: 510/495-2243 On Fri, 5 Feb 2010 10:55:18 -0700, Justin Lloyd wrote: > Hi all, > > I've opened a ticket on this but I wanted to share my thoughts with > the community to see if anyone has had the same thought and perhaps > has already implemented something to this effect. > > I'd like for Cfengine on each host to be able to send an email every > time it tries to repair a promise, whether or not it is successful. All we need is for cfengine to *log* the fact that a promise repair failed. That would be sufficient, as then cf-execd would notice the output of cf-agent differs from the previous run, and will email the output as per it's normal behavior. This of course assumes you are running cf-execd, and you are NOT running cf-agent with --inform, in which case the output will always be different. > Maybe something as simple as this: > > body agent control { > repair_email_address => "cfengine-repa...@mycompany.com"; > # perhaps some additional tunable parameters, e.g. > # report_on => { "repaired" | "not_kept" | "any" }; > # include_error => { "true" | "false" }; > # success_subject_prefix => "[nova promise repaired] "; > # failure_subject_prefix => "[nova promise not kept] "; > # etc. > } > > This would allow for a more real-time view of the Cfengine > environment, by enabling each host to send an email with repair > success or failure, promise handle, any relevant error message, etc. > For example, this could help detect repairs immediately, especially > if the same system keeps repairing the same thing or multiple systems > are performing the same repair, indicating a fundamental root cause > that requires administrator intervention. > > IMHO (if anyone thinks this opinion is misguided please say so), > Cfengine shouldn't have to repair anything in a properly functioning > environment and, if it does, then something needs investigating. It > may just be someone manually changing a file's permissions and > Cfengine is correcting them (which may mean user/admin training is > required). This philosophy does assume, however, that promises are > written in a way that they will only make corrections when necessary. > > For example, if I have a promise to ensure that a Solaris system's > hostname is in /etc/nodename, I should write the promise so that it > doesn't do anything if the file is correct, rather than just > recreating the correct file every time the agent runs, regardless of > whether the file's contents are already correct. > > Any thoughts or comments on this? > > Thanks, > Justin > > This electronic communication and any attachments may contain > confidential and proprietary > information of DigitalGlobe, Inc. If you are not the intended > recipient, or an agent or employee > responsible for delivering this communication to the intended > recipient, or if you have received > this communication in error, please do not print, copy, retransmit, > disseminate or > otherwise use the information. Please indicate to the sender that you > have received this > communication in error, and delete the copy you received. > DigitalGlobe reserves the > right to monitor any electronic communication sent or received by its > employees, agents > or representatives. > > _______________________________________________ > Help-cfengine mailing list > Help-cfengine@cfengine.org > https://cfengine.org/mailman/listinfo/help-cfengine _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine This electronic communication and any attachments may contain confidential and proprietary information of DigitalGlobe, Inc. If you are not the intended recipient, or an agent or employee responsible for delivering this communication to the intended recipient, or if you have received this communication in error, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this communication in error, and delete the copy you received. DigitalGlobe reserves the right to monitor any electronic communication sent or received by its employees, agents or representatives.
_______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine