Hi, I'm trying to debug a problem where the named daemon isn't properly restarting when asked to do so in via shellcommands statement, and normally when cfagent is asked to run from a cfrun command executed on my cfengine master server. The restart failures do not happen everytime--it's more of a 50/50 spilt between success and failure.
I'm using cfengine 2.4.14, and the Red Hat Enterprise Linux 3 on my servers. In our cfengine environment, named is only restarted when a new named.conf file is copied from our master cfengine server to the server that belongs to the dns_master class, using a define inside of the copy statement. Both of those statements are below. Once a new named.conf file put on the master cfengine server, we run a quick shell script that does some local housekeeping tasks, then asks cfagent to run on each DNS server to update itself with the latest config files and restart named. cfrun -f <pathto>/cfrun.hosts host1 host2 host3 host4 copy: # this is for the DNS master server only dns_master:: $(configroot)/os/etc/named.conf.MASTER dest=/etc/named.conf server=$(cfserver) owner=root group=root mode=0644 define=named_restart # this is for the rest of the DNS slave servers dns_slave:: $(configroot)/os/etc/named.conf.SLAVE dest=/etc/named.conf server=$(cfserver) owner=root group=root mode=0644 define=named_restart shellcommands: named_restart:: "/sbin/service named restart" This type of copy/define shellcommand setup works great in our environment for ntpd, innd, sendmail, sshd, and others, but not for named. When I run the cfrun command with -d3 output, I see this, which points to the fact that named stops just fine, but seems to still be running when it is time to start it back up. ********************************************************************* Main Tree Sched: shellcommands pass 1 @ Mon Jun 26 14:25:58 2006 ********************************************************************* cfengine:host1: Executing script /sbin/service named restart...(timeout=0,uid=-1,gid=-1) (Setting umask to 77) cfengine:host1:in/service name: Stopping named: [ OK ] cfengine:host1:in/service name: named: already running cfengine:host1: Finished script /sbin/service named restart --------------------------------------------------------------------- Thinking that there wasn't ample time between the service shutdown & startup, I tried replacing "/sbin/service named restart" in shellcommands section with "/sbin/service named stop" "/bin/sleep 6" "/sbin/service named start" but was unsuccessful. In the shellcommands section, I've also experimented with: timeout=15 useshell=false I tried moving the named_restart class from shellcommands: to processes: like this processes: named_restart:: "named" restart "/etc/init.d/named restart" or this named_restart:: "named" matches=1 restart "/sbin/service named restart" but those also failed. Has anyone on this mailing list experienced this problem or have some tips to help me fix it? Many thanks, Tom _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org http://cfengine.org/mailman/listinfo/help-cfengine