Hi,

I'm trying to debug a problem where the named daemon isn't properly 
restarting when asked to do so in via shellcommands statement, and 
normally when cfagent is asked to run from a cfrun command executed on 
my cfengine master server.  The restart failures do not happen 
everytime--it's more of a 50/50 spilt between success and failure.

I'm using cfengine 2.4.14, and the Red Hat Enterprise Linux 3 on my servers.

In our cfengine environment, named is only restarted when a new 
named.conf file is copied from our master cfengine server to the server 
that belongs to the dns_master class, using a define inside of the copy 
statement.  Both of those statements are below.  Once a new named.conf 
file put on the master cfengine server, we run a quick shell script that 
does some local housekeeping tasks, then asks cfagent to run on each DNS 
server to update itself with the latest config files and restart named.

cfrun -f <pathto>/cfrun.hosts host1 host2 host3 host4

copy:

# this is for the DNS master server only
dns_master::
$(configroot)/os/etc/named.conf.MASTER  dest=/etc/named.conf
        server=$(cfserver)
        owner=root
        group=root
        mode=0644
        define=named_restart

# this is for the rest of the DNS slave servers
dns_slave::
$(configroot)/os/etc/named.conf.SLAVE  dest=/etc/named.conf
        server=$(cfserver)
        owner=root
        group=root
        mode=0644
        define=named_restart


shellcommands:

        named_restart::
        "/sbin/service named restart"

This type of copy/define shellcommand setup works great in our 
environment for ntpd, innd, sendmail, sshd, and others, but not for 
named.  When I run the cfrun command with -d3 output, I see this, which 
points to the fact that named stops just fine, but seems to still be 
running when it is time to start it back up.

*********************************************************************
  Main Tree Sched: shellcommands pass 1 @ Mon Jun 26 14:25:58 2006
*********************************************************************
cfengine:host1: Executing script /sbin/service named 
restart...(timeout=0,uid=-1,gid=-1)
(Setting umask to 77)
cfengine:host1:in/service name: Stopping named: [  OK  ]
cfengine:host1:in/service name: named: already running
cfengine:host1: Finished script /sbin/service named restart
---------------------------------------------------------------------


Thinking that there wasn't ample time between the service shutdown & 
startup, I tried replacing "/sbin/service named restart" in 
shellcommands section with

"/sbin/service named stop"
"/bin/sleep 6"
"/sbin/service named start"

but was unsuccessful.  In the shellcommands section, I've also 
experimented with:

timeout=15
useshell=false

I tried moving the named_restart class from shellcommands: to processes: 
like this

processes:

named_restart::
        "named" restart "/etc/init.d/named restart"

or this

named_restart::
        "named"
        matches=1
        restart "/sbin/service named restart"


but those also failed.

Has anyone on this mailing list experienced this problem or have some 
tips to help me fix it?

Many thanks,

Tom
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
http://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to