On Sun, 8 Mar 2009, Mads Toftum wrote:
I've got a system that's behaing a bit odd. It's running a classic network service that's got one parent proc and spawns one child proc / connection. It's fine with about 100 or so concurrent child procs, but once it starts hitting a higher number <defunt> procs start appearing.
Defunct processes are due to the parent process not doing a wait(3C) or waitpid(3C) call for the process ID of the child. Unless the parent process has the signal handling for SIGCHLD set to SIG_IGN, then each child process remains in the process table until the parent process has invoked waitpid(3C) to obtain its exit status.
A large number of defunct processes either indicates that the parent process went away (e.g. crashed) or the parent process is not properly designed/implemented to execute waitpid(3C) for each of the child processes that it starts. If the parent process goes away, then the child process becomes owned by 'init' (PID 1).
If the parent process crashes and gets automatically restarted, then there may be many child processes which are now owned by 'init' and it may take some time for them to be harvested. If the parent process continues running but is slow to execute waitpid(3C), or has an arbitrary limit to its process management so that it will fail to execute waitpid(3C), then there may be many child processes which have exited but clog the process listing. In this case, the indicated parent process should be the original parent process which should still be running.
Regardless, it seems that your application has a bug. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org