I monitor my cfengine systems in a couple of different ways. 1) Nagios port 5308 check to the policy servers 2) Nagios NRPE check_file_age to ensure the clients are updating properly (see below) 3) Hourly cf-servd status check/fix
Nagios NRPE check_file_age check ################################### If the last_update file hasn't been updated recently then fire off an alert. file: nrpe.cfg /usr/lib/nagios/plugins/check_file_age -w 2700 -c 7500 -f /var/cfengine/last_update file:update.cf policy_updated:: "/var/cfengine/last_update" comment => "Update /etc/issue with last policy update time", create => "true", edit_defaults => empty, edit_line => expand_template("/var/cfengine/inputs/templates/cf_update.tpl"); file: /var/cfengine/last_update # cat /var/cfengine/last_update # This server is managed by CFengine, any manual edits may be reverted. # CFengine policy last updated: Fri Apr 22 14:12:02 2011. Hourly cf-servd status check ########################### Once every 3-5 days at a random times my 3.1.2 cfservd process will segfault. Until I have a chance to test and rollout an updated version as recommended by Eystein Stenberg, I use this script to help ensure my cfservd process are up and running. Log error: Apr 21 07:19:47 usg-cfeps7901 kernel: cf-serverd[25491]: segfault at 000000000000f790 rip 00002ab5227497dd rsp 0000000041efd900 error 4 file: /etc/cron.hourly/cfservd_check.py ------------------------------------------ #!/usr/bin/env python import os import platform import subprocess import syslog command = 'service cfservd status' # Check exit status of 'service cfservd status' exit_status = subprocess.call(command, shell=True) # If cfservd isn't running, restart it and log what we have done if exit_status != 0: syslog.syslog("Restarting cfservd!") os.system("/sbin/service cfservd restart") On Fri, Apr 22, 2011 at 1:15 PM, <mashtin.ba...@gmail.com> wrote: > We had a case where an early release cfengine (I think 3.0.2b1) > was running but not updating file on clients. I believe the problem > here is a cf-server that's horked and we'll be upgrading this soon > but it raised the question; How do you monitor cfengine? Obviously > cfengine shouldn't monitor itself. And from the above case, having > Nagios check for client and server processes running is also insufficient. > I'm thinking perhaps have cfengine create and push a timestamp file > to all clients and have nagios check for presence/date of such files. > > What are others doing? > > TIA > > _______________________________________________ > Help-cfengine mailing list > Help-cfengine@cfengine.org > https://cfengine.org/mailman/listinfo/help-cfengine > >
_______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine