Hi, I've recently started to have a problem where some of my
clients puppetd processes are locking up (the puppetdlock file is
several hours old). My server is running puppet 2.7.12 on Centos
6.2 and my clients are running puppet 2.7.12 on Scientific Linux
6.2. If I check the puppetdlock file, it contains the pid of the
currently "running" puppet. If I restart puppetd, it's fine for a
while, but sooner or later I end up in the same state. If I run
strace against the puppetd, I get:
# strace -p 10726 Process 10726 attached - interrupt to quit
select(8, [7], NULL, NULL, {1, 560249}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL,
NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL,
[], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8,
[7], NULL, NULL, {2, 0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(8, [7], NULL,
NULL, {2, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL,
[], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 ^C
<unfinished ...> Process 10726 detached
If I run lsof, I get:
# lsof -p 10726 COMMAND PID USER FD TYPE DEVICE
SIZE/OFF NODE NAME puppetd 10726 root cwd DIR
8,1 4096 2 / puppetd 10726 root rtd DIR
8,1 4096 2 / puppetd 10726 root txt REG
8,1 10576 8151417 /usr/bin/ruby [...] puppetd 10726 root mem
REG 8,1 26050 8153796
/usr/lib64/gconv/gconv-modules.cache puppetd 10726 root 0r CHR
1,3 0t0 3820 /dev/null puppetd 10726 root 1w CHR
1,3 0t0 3820 /dev/null puppetd 10726 root 2w CHR
1,3 0t0 3820 /dev/null puppetd 10726 root 3r FIFO
0,8 0t0 17283753 pipe puppetd 10726 root 4w FIFO
0,8 0t0 17283753 pipe puppetd 10726 root 5u unix
0xffff88013680b0c0 0t0 17283804 socket puppetd 10726 root
6u REG 8,1 6045 3145906
/var/log/puppet/http.log puppetd 10726 root 7u IPv4
17283830 0t0 TCP *:8139 (LISTEN)
If I look at what puppet is running:
# ps -elfw | grep 10726 5 S root 10726 1 0 81 1 - 61549
poll_s 15:15 ? 00:00:17 /usr/bin/ruby /usr/sbin/puppetd
--debug --verbose 0 Z root 11429 10726 0 81 1 - 0 exit
15:39 ? 00:00:00 [sh]<defunct>
Help?
...dave