Hi,
I'm trying to debug a problem where the named daemon isn't properly
restarting when asked to do so in via shellcommands statement, and
normally when cfagent is asked to run from a cfrun command executed on
my cfengine master server. The restart failures do not happen
everytime--it's more of a 50/50 spilt between success and failure.
I'm using cfengine 2.4.14, and the Red Hat Enterprise Linux 3 on my servers.
In our cfengine environment, named is only restarted when a new
named.conf file is copied from our master cfengine server to the server
that belongs to the dns_master class, using a define inside of the copy
statement. Both of those statements are below. Once a new named.conf
file put on the master cfengine server, we run a quick shell script that
does some local housekeeping tasks, then asks cfagent to run on each DNS
server to update itself with the latest config files and restart named.
cfrun -f <pathto>/cfrun.hosts host1 host2 host3 host4
copy:
# this is for the DNS master server only
dns_master::
$(configroot)/os/etc/named.conf.MASTER dest=/etc/named.conf
server=$(cfserver)
owner=root
group=root
mode=0644
define=named_restart
# this is for the rest of the DNS slave servers
dns_slave::
$(configroot)/os/etc/named.conf.SLAVE dest=/etc/named.conf
server=$(cfserver)
owner=root
group=root
mode=0644
define=named_restart
shellcommands:
named_restart::
"/sbin/service named restart"
This type of copy/define shellcommand setup works great in our
environment for ntpd, innd, sendmail, sshd, and others, but not for
named. When I run the cfrun command with -d3 output, I see this, which
points to the fact that named stops just fine, but seems to still be
running when it is time to start it back up.
*********************************************************************
Main Tree Sched: shellcommands pass 1 @ Mon Jun 26 14:25:58 2006
*********************************************************************
cfengine:host1: Executing script /sbin/service named
restart...(timeout=0,uid=-1,gid=-1)
(Setting umask to 77)
cfengine:host1:in/service name: Stopping named: [ OK ]
cfengine:host1:in/service name: named: already running
cfengine:host1: Finished script /sbin/service named restart
---------------------------------------------------------------------
Thinking that there wasn't ample time between the service shutdown &
startup, I tried replacing "/sbin/service named restart" in
shellcommands section with
"/sbin/service named stop"
"/bin/sleep 6"
"/sbin/service named start"
but was unsuccessful. In the shellcommands section, I've also
experimented with:
timeout=15
useshell=false
I tried moving the named_restart class from shellcommands: to processes:
like this
processes:
named_restart::
"named" restart "/etc/init.d/named restart"
or this
named_restart::
"named"
matches=1
restart "/sbin/service named restart"
but those also failed.
Has anyone on this mailing list experienced this problem or have some
tips to help me fix it?
Many thanks,
Tom
_______________________________________________
Help-cfengine mailing list
[email protected]
http://cfengine.org/mailman/listinfo/help-cfengine