I monitor my cfengine systems in a couple of different ways.

1) Nagios port 5308 check to the policy servers
2) Nagios NRPE check_file_age to ensure the clients are updating properly
(see below)
3) Hourly cf-servd status check/fix

Nagios NRPE check_file_age check
###################################
If the last_update file hasn't been updated recently then fire off an alert.


file: nrpe.cfg
/usr/lib/nagios/plugins/check_file_age -w 2700 -c 7500 -f
/var/cfengine/last_update

file:update.cf
     policy_updated::
        "/var/cfengine/last_update"
        comment       => "Update /etc/issue with last policy update time",
        create        => "true",
        edit_defaults => empty,
        edit_line     =>
expand_template("/var/cfengine/inputs/templates/cf_update.tpl");

file: /var/cfengine/last_update
# cat /var/cfengine/last_update
# This server is managed by CFengine, any manual edits may be reverted.
# CFengine policy last updated: Fri Apr 22 14:12:02 2011.



Hourly cf-servd status check
###########################
Once every 3-5 days at a random times my 3.1.2 cfservd process will
segfault.  Until I have a chance to test and rollout an updated version as
recommended by Eystein Stenberg, I use this script to help ensure my cfservd
process are up and running.

Log error:
Apr 21 07:19:47 usg-cfeps7901 kernel: cf-serverd[25491]: segfault at
000000000000f790 rip 00002ab5227497dd rsp 0000000041efd900 error 4

file: /etc/cron.hourly/cfservd_check.py
------------------------------------------
#!/usr/bin/env python

import os
import platform
import subprocess
import syslog
command = 'service cfservd status'

# Check exit status of 'service cfservd status'
exit_status = subprocess.call(command, shell=True)

# If cfservd isn't running, restart it and log what we have done
if exit_status != 0:
        syslog.syslog("Restarting cfservd!")
        os.system("/sbin/service cfservd restart")




On Fri, Apr 22, 2011 at 1:15 PM, <mashtin.ba...@gmail.com> wrote:

> We had a case where an early release cfengine (I think 3.0.2b1)
> was running but not updating file on clients. I believe the problem
> here is a cf-server that's horked and we'll be upgrading this soon
> but it raised the question; How do you monitor cfengine? Obviously
> cfengine shouldn't monitor itself. And from the above case, having
> Nagios check for client and server processes running is also insufficient.
> I'm thinking perhaps have cfengine create and push a timestamp file
> to all clients and have nagios check for presence/date of such files.
>
> What are others doing?
>
> TIA
>
> _______________________________________________
> Help-cfengine mailing list
> Help-cfengine@cfengine.org
> https://cfengine.org/mailman/listinfo/help-cfengine
>
>
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to