On Friday, January 6, 2012 5:31:34 PM UTC+1, jcbollinger wrote: > > > Nothing in your log suggests that the Puppet agent is doing any work > when it fails. It appears to apply a catalog successfully, then > create a report successfully, then nothing else. That doesn't seem > like a problem in a module. Nevertheless, you could try removing > classes from the affected node's configuration and testing whether > Puppet still freezes. >
John, thanks for your reply. I'll be deploying a node that includes no modules at all and see if a zombie process appears again. > You said the agent runs for several hours before it hangs. Does it > perform multiple successful runs during that time? That also would > tend to counterindicate a problem in your manifests. > Yes, the agents perform several runs (with no changes to the catalog) and then simply freeze up, waiting for the defunct sh process to return. > I'm suspicious that something else on your systems is interfering with > the Puppet process; some kind of service manager, for example. You'll > have to say whether that's a reasonable guess. Alternatively, you may > have a system-level bug; there have been a few Ruby bugs and kernel > regressions that interfered with Puppet operation. > Those are all pretty plain Ubuntu 10.04.3 server installations (both i386 and x86_64), especially the ones I deployed this week, which aren't in production yet. What kind of service manager could there even be that interferes? > You could try using strace to determine where the failure happens, > though that's not as simple as it may sound. > Simply trying to strace the zombie process only results in an "Operation not permitted". The agent process shows these lines repeatedly: Process 3741 attached - interrupt to quit select(8, [7], NULL, NULL, {1, 723393}) = 0 (Timeout) sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 select(8, [7], NULL, NULL, {2, 0}) = 0 (Timeout) sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 ... That doesn't tell me anything other than that the puppet agent is blocking on select() with a timeout of two seconds. You could also try just sidestepping the problem by using cron to > launch puppetd --runonce at your desired intervals, instead of leaving > puppetd running in daemon mode. A fair number of people seem to run > Puppet that way, and it has some advantages. > Thanks, that's a good idea that I will probably have to resort to if the problem doesn't go away. Andreas -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To view this discussion on the web visit https://groups.google.com/d/msg/puppet-users/-/z-sG9Y7q6vQJ. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.