https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143
Bug ID: 94143 Summary: [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls Product: gcc Version: 9.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libfortran Assignee: unassigned at gcc dot gnu.org Reporter: trnka at scm dot com Target Milestone: --- Since PR90038 introduced a SIGCHLD handler into execute_command_line(), calling an asynchronous execute_command_line(wait=.false.) breaks all subsequent synchronous calls (no matter if those are through execute_command_line(wait=.true.) or through various libraries), because the signal handler stays around forever and indiscriminately reaps any child processes. The result is that the internal wait() at the end of system()-like calls fails with ECHILD if the signal handler fires earlier and does a wait() on that process. Given that this is a race between the signal handler and the synchronous wait(), it's somewhat tricky to reproduce reliably. The following test case triggers it on my machine program asyncexec implicit none integer :: i !$omp parallel default(shared) !$omp single call execute_command_line('sleep 30', wait=.false.) do i = 1, 10 write(*,*) i call execute_command_line('/bin/true') end do !$omp end single !$omp end parallel end program This typically leads to the following error on the first or second iteration: Fortran runtime error: EXECUTE_COMMAND_LINE: Termination status of the command-language interpreter cannot be obtained Error termination. Backtrace: #0 0x7f979747c5fa in set_cmdstat at ../../../libgfortran/intrinsics/execute_command_line.c:63 #1 0x7f979747c829 in set_cmdstat at ../../../libgfortran/intrinsics/execute_command_line.c:58 #2 0x7f979747c829 in execute_command_line at ../../../libgfortran/intrinsics/execute_command_line.c:133 The issue has nothing to do with OpenMP, I'm just using it to get multiple concurrent threads to maximize the chance that the signal handler will run on a different thread before the forking thread has a chance to call wait(). In real life, this issue affects MPI applications because MPI libraries typically spawn some background event-handling threads even if the program itself is single-threaded. I don't see a way to workaround this in user code, so I'd suggest removing the offending SIGCHLD handler as a quick "fix". That'll leave zombie processes around, but those are mostly harmless. IMHO there are two possible proper solutions: 1) Spawn a dedicated thread to specifically wait for the PID launched by the asynchronous call, instead of a blanket wait(-1). 2) Record all asynchronously launched PIDs in a global list. The SIGCHLD handler would then extract the PID from siginfo and consult the list to see whether it should call wait(). Option #1 seems easier to implement to me. I can try to come up with a patch if desired.