I appreciate the thoughtful responses. To restate one of my original
thoughts on this, in general Cfengine shouldn't be making a lot of
repairs, because a repair means something went wrong, for which the
underlying cause should be prioritized and investigated. There are of
course certain kinds of promises that will likely be repaired often and
by many machines, such as pulling down the latest copies of certain
master files (e.g. a site-wide sudoers configuration), which is why
flexibility in being able to enable or disable on a per-promise or
per-bundle basis could also be valuable.

 

Cfengine does already log promise repairs, both in a log file and to the
policy server's web server. I'm just brainstorming ways of setting up
the environment to facilitate faster response time to budding issues.
Plugging in to various types of system and log monitoring tools like
Zenoss, Nagios, Splunk, etc. would certainly be another approach.

 

IMHO, too many emails isn't the problem. Too much information is. J
Moving stuff to logs, reports, tweets, etc. just shuffles the
information around (squeezing the balloon), but you're still dealing
with the same amount of information. Use information well to reduce the
amount of new information generated.

 

Justin

 

From: help-cfengine-boun...@cfengine.org
[mailto:help-cfengine-boun...@cfengine.org] On Behalf Of Michael Potter
Sent: Monday, February 08, 2010 2:23 PM
To: Deb Heller-Evans
Cc: help-cfengine@cfengine.org
Subject: Re: Email notification of repairs

 

 

2010/2/9 Deb Heller-Evans <d...@es.net>

Justin,

I see where you're going here - that you want to alert on unkept
promises. But I am sure that like many here on this list, I receive
hundreds if not thousands of emails per day that are already filtered
and sorted with often times more information than I can or want to
process.  Alternatively, you could log the condition to a file, rather
than send an alert email, and some parsing function could periodically
alert you to the negative status.

What you're describing might be a good tickler for a Nagios alert
condition. We've found the alert mechanisms in Nagios to scale well over
hundreds of systems, without the necessity of email floods. Haven't yet
coupled Nagios with Cfengine, but it's on my horizon.


Kind Regards,
deb

Deb Heller-Evans               1 Cyclotron Road
Computer Systems Engineer      Berkeley, CA 94720
ESnet  http://www.es.net/      Desk: 510/495-2243


On Fri, 5 Feb 2010 10:55:18 -0700, Justin Lloyd wrote:
> Hi all,
>
> I've opened a ticket on this but I wanted to share my thoughts with
> the community to see if anyone has had the same thought and perhaps
> has already implemented something to this effect.
>
> I'd like for Cfengine on each host to be able to send an email every
> time it tries to repair a promise, whether or not it is successful.

 

All we need is for cfengine to *log* the fact that a promise repair
failed. That would be sufficient, as then cf-execd would notice the
output of cf-agent differs from the previous run, and will email the
output as per it's normal behavior. This of course assumes you are
running cf-execd, and you are NOT running cf-agent with --inform, in
which case the output will always be different.

 

        > Maybe something as simple as this:
        >
        > body agent control {
        >     repair_email_address => "cfengine-repa...@mycompany.com";
        >     # perhaps some additional tunable parameters, e.g.
        >     #  report_on => { "repaired" | "not_kept" | "any" };
        >     #  include_error => { "true" | "false" };
        >     #  success_subject_prefix => "[nova promise repaired] ";
        >     #  failure_subject_prefix => "[nova promise not kept] ";
        >     #  etc.
        > }
        >
        > This would allow for a more real-time view of the Cfengine
        > environment, by enabling each host to send an email  with
repair
        > success or failure, promise handle, any relevant error
message, etc.
        > For example, this could help detect repairs immediately,
especially
        > if the same system keeps repairing the same thing or multiple
systems
        > are performing the same repair, indicating a fundamental root
cause
        > that requires administrator intervention.
        >
        > IMHO (if anyone thinks this opinion is misguided please say
so),
        > Cfengine shouldn't have to repair anything in a properly
functioning
        > environment and, if it does, then something needs
investigating. It
        > may just be someone manually changing a file's permissions and
        > Cfengine is correcting them (which may mean user/admin
training is
        > required). This philosophy does assume, however, that promises
are
        > written in a way that they will only make corrections when
necessary.
        >
        > For example, if I have a promise to ensure that a Solaris
system's
        > hostname is in /etc/nodename, I should write the promise so
that it
        > doesn't do anything if the file is correct, rather than just
        > recreating the correct file every time the agent runs,
regardless of
        > whether the file's contents are already correct.
        >
        > Any thoughts or comments on this?
        >
        > Thanks,
        > Justin
        >
        > This electronic communication and any attachments may contain
        > confidential and proprietary
        > information of DigitalGlobe, Inc. If you are not the intended
        > recipient, or an agent or employee
        > responsible for delivering this communication to the intended
        > recipient, or if you have received
        > this communication in error, please do not print, copy,
retransmit,
        > disseminate or
        > otherwise use the information. Please indicate to the sender
that you
        > have received this
        > communication in error, and delete the copy you received.
        > DigitalGlobe reserves the
        > right to monitor any electronic communication sent or received
by its
        > employees, agents
        > or representatives.
        >
        > _______________________________________________
        > Help-cfengine mailing list
        > Help-cfengine@cfengine.org
        > https://cfengine.org/mailman/listinfo/help-cfengine
        _______________________________________________
        Help-cfengine mailing list
        Help-cfengine@cfengine.org
        https://cfengine.org/mailman/listinfo/help-cfengine

 


This electronic communication and any attachments may contain confidential and 
proprietary 
information of DigitalGlobe, Inc. If you are not the intended recipient, or an 
agent or employee 
responsible for delivering this communication to the intended recipient, or if 
you have received 
this communication in error, please do not print, copy, retransmit, disseminate 
or 
otherwise use the information. Please indicate to the sender that you have 
received this 
communication in error, and delete the copy you received. DigitalGlobe reserves 
the 
right to monitor any electronic communication sent or received by its 
employees, agents 
or representatives.

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to