On Friday, January 6, 2012 5:31:34 PM UTC+1, jcbollinger wrote:
>
>
> Nothing in your log suggests that the Puppet agent is doing any work 
> when it fails.  It appears to apply a catalog successfully, then 
> create a report successfully, then nothing else.  That doesn't seem 
> like a problem in a module.  Nevertheless, you could try removing 
> classes from the affected node's configuration and testing whether 
> Puppet still freezes. 
>

John, thanks for your reply. I'll be deploying a node that includes no 
modules at all and see if a zombie process appears again.
 

> You said the agent runs for several hours before it hangs.  Does it 
> perform multiple successful runs during that time?  That also would 
> tend to counterindicate a problem in your manifests. 
>

Yes, the agents perform several runs (with no changes to the catalog) and 
then simply freeze up, waiting for the defunct sh process to return.
 

> I'm suspicious that something else on your systems is interfering with 
> the Puppet process; some kind of service manager, for example.  You'll 
> have to say whether that's a reasonable guess.  Alternatively, you may 
> have a system-level bug; there have been a few Ruby bugs and kernel 
> regressions that interfered with Puppet operation.
>

Those are all pretty plain Ubuntu 10.04.3 server installations (both i386 
and x86_64), especially the ones I deployed this week, which aren't in 
production yet. What kind of service manager could there even be that 
interferes?  
 

> You could try using strace to determine where the failure happens, 
> though that's not as simple as it may sound. 
>

Simply trying to strace the zombie process only results in an "Operation 
not permitted". The agent process shows these lines repeatedly:

Process 3741 attached - interrupt to quit
select(8, [7], NULL, NULL, {1, 723393}) = 0 (Timeout)
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
select(8, [7], NULL, NULL, {2, 0})      = 0 (Timeout)
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
...

That doesn't tell me anything other than that the puppet agent is blocking 
on select() with a timeout of two seconds.

You could also try just sidestepping the problem by using cron to 
> launch puppetd --runonce at your desired intervals, instead of leaving 
> puppetd running in daemon mode.  A fair number of people seem to run 
> Puppet that way, and it has some advantages. 
>

Thanks, that's a good idea that I will probably have to resort to if the 
problem doesn't go away.

Andreas

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/puppet-users/-/z-sG9Y7q6vQJ.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to