What should system(3) do when the signal action for SIGCHLD is SIG_IGN, or has SA_NOCLDWAIT set?
Setting SIGCHLD to SIG_IGN has the effect of reaping zombie children automatically, so that calling wait(2) is unnecessary to reap them -- and, further, doesn't return _at all_ until the last child has exited. This semantics -- same as setting SA_NOCLDWAIT -- is enshrined in POSIX: If the calling process has SA_NOCLDWAIT set or has SIGCHLD set to SIG_IGN, and the process has no unwaited for children that were transformed into zombie processes, the calling thread will block until all of the children of the process containing the calling thread terminate, and wait() and waitpid() will fail and set errno to [ECHILD]. https://pubs.opengroup.org/onlinepubs/7908799/xsh/wait.html So if a process already has a child, and calls system(3) as it is currently implemented in libc in ~all versions of NetBSD, system(3) will hang indefinitely until the existing child exits. This manifests in newer versions of ctwm which set SIGCHLD to SIG_IGN if you have a .xsession file that does something like: xterm & xclock & exec ctwm This causes ctwm to start with two children already, which in turn causes system(3) to hang when you try to start an application from the ctwm menu. The ctwm hang led to PR kern/57527 (https://gnats.netbsd.org/57527, `kern' because at first it looked like a missing wakeup in the kernel before we realized this is exactly how POSIX expects SIG_IGN and SA_NOCLDWAIT to behave), which has some litmus tests for the semantics and draft code to mitigate the situation in system(3). So, should we do anything about this in system(3)? Pro: Makes existing code code like ctwm work. Cons: - POSIX doesn't ask system(3) to work when SIGCHLD is set to SIG_IGN or when it has SA_NOCLDWAIT set, so this code is nonportable anyway; might break on other systems too, so breakage on NetBSD leading to an upstream bug report is helpful. - Changing signal actions has the side effect of clearing the signal queue, and I don't see a way around that. Alternative would be to say: don't do that; fix the buggy code that calls system(3) with SIGCHLD ignored, either by having it set a signal handler that calls waitpid(-1, NULL, WNOHANG) until success, or by having it use something other than system(3).