On Sat, Apr 07, 2007 at 11:09:55AM -0600, Philip Guenther wrote: > Instead of separating the obtaining of the pid from the actual > reaping, you can instead separate the blocking from the return of the > pid+reaping. That lets you lock the datastructure only when you know > wait() won't block. To block until a child is ready to be reaped, use > SIGCHLD, blocking it when you aren't ready
Hmm, I hadn't thought of doing the actual reaping in a SIGCHLD handler, and blocking SIGCHLD while doing the fork work. Or, as in your example, using sigsuspend to wait for an instance of a SIGCHLD signal to occur, and only then calling waitpid(). This then leaves me with a bunch of edge cases to satisfy myself are being handled correctly. For example, in your my_wait_loop(): > void my_wait_loop(void) > { > pid_t pid; > int cstat, err; > > for (;;) > { > while (!saw_sigchld) > { > sigsuspend(&orig_sigset); > } > > saw_sigchld = 0; > > lock_the_shared_datastructure(); > do > { > pid = waitpid(-1, &cstat, WNOHANG); > } while (pid < 0 && (err = errno) == EINTR); > if (pid > 0) > { > handle_exited_child(pid); > } > else if (pid == 0 || err == ECHILD) > { > /* bogus SIGCHLD, just ignore it */ > } > else > { > /* should not occur (EFAULT? EINVAL?) */ > syslog("unexpected waitpid() error: %s", strerror(err)); > } > unlock_the_shared_datastructure(); > } > } Suppose child 1 dies, causing a SIGCHLD to be pending, and then a second child dies, before sigsuspend() unblocks the signal. sigsuspend returns, and one child is reaped. Next time around the loop, will the second child be reaped? If so, why? I'm not saying that anything is actually wrong with the code you've provided; rather, that it's difficult for me to understand the subtleties involved in asynchronous signal-driven programming. And that's with a copy of the Stevens book beside me :-) Many thanks for giving me more food for thought. Regards, Brian.