Reference: <http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8788>
[Adding bug-autoconf in CC] On Thursday 02 June 2011, Stefano Lattarini wrote: > Hello automakers. > > While teststing the `testsuite-work' branch on NetBSD 5, I've encountered > a weird failure in the test `parallel-tests3.test', which actually caused > the whole testsuite to crash (!) due to a stray SIGTERM. > > [SNIP] > > Any idea of what's going on? > Ah ah, got it! (I think). The failure is due to an interaction between some features of GNU make and some (mis)features the NetBSD Korn Shell. Let's see the details. [1] The Korn shell gets selected to run the Makefile recipes ------------------------------------------------------------- On NetBSD, an autoconf-generated configure script will select /bin/ksh as the $(SHELL) used to execute the Makefile recipes: $ grep 'SHELL.*=' tests/parallel-tests3.dir/*/config.log tests/parallel-tests3.dir/parallel/config.log:SHELL='/bin/ksh' tests/parallel-tests3.dir/serial/config.log:SHELL='/bin/ksh' [2] The Korn shell has quirks w.r.t. signal handling ---------------------------------------------------- The NetBSD's Korn Shell is one of those shells which try to "propagate" terminating signals, as explained in the ``Signal Handling'' node of the (as of today yet unreleased) bleeding-edge autoconf manual; see also these relevant links: <http://lists.gnu.org/archive/html/autoconf-patches/2011-09/msg00005.html> <https://lists.gnu.org/archive/html/bug-autoconf/2011-09/msg00004.html> <http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2009-February/004121.html> And in fact, NetBSD's Korn Shell even seems to propagate a fatal signal it has received *to all its process group*! Let's see a few examples: $ /bin/sh -c '/bin/sh -c "kill -15 \$\$"; echo alive' [1] Terminated /bin/sh -c "kill... alive $ /bin/ksh -c '/bin/sh -c "kill -15 \$\$"; echo alive' Terminated alive # ksh apparently terminate its parent $ /bin/sh -c '/bin/ksh -c "kill -15 \$\$"; echo alive' Terminated $ /bin/ksh -c '/bin/ksh -c "kill -15 \$\$"; echo alive' Terminated Terminated Just to be sure, let's try to trace the systems calls made by the Korn shell: $ ktrace /bin/sh -c ' > echo parent: $$ > ktrace -a /bin/ksh -c "echo child: \$\$; kill -15 \$\$" > echo alive ' parent: 20429 child: 4829 Terminated $ kdump ktrace.out | grep -i sig | grep -v __sig 4829 1 ksh CALL kill(0x12dd, SIGTERM) 4829 1 ksh PSIG SIGTERM caught handler=0x420810 mask=(): code=SI_USER sent by pid=4829, uid=1242) 4829 1 ksh CALL kill(0, SIGTERM) 4829 1 ksh PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242) 20429 1 sh PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242) (Note that `0x12dd' is decimal 4829). [3] GNU make propagates signal to the running recipes ----------------------------------------------------- If GNU make receives a terminating signal while it's updating some target(s), it propagates that signal to the currently-executing recipe(s): $ cat Makefile all: 1 2 1 2: @trap 'echo got SIGTERM; exit 77' 15; while :; do :; done $ gmake -j2 & [1] 5980 $ kill $! got SIGTERM got SIGTERM gmake: *** [2] Error 77 gmake: *** [1] Error 77 (FWIW, I find this to be an helpful and rational behaviour). [4] Putting it all together --------------------------- So here is my diagnosis of what happens when `parallel-tests3.test' is run on NetBSD with GNU make: 1) various setup/preparation commands get executed in this script; the Korn shell gets selected to run the recipe of the Makefile; 2) "make -j1 check" is launched in the background: cd serial $MAKE -j1 check & 3) some more commands get run, and they concludes before the background make process launched in (2) has concluded; 4) the shell executing `parallel-tests3.test' explicitly kills the still running background "make" process with a SIGTERM: cd .. kill $! 5) GNU make "relays" the SIGTERM to the korn shell executing the still running recipe(s); 6) in turn, the korn shell relays the SIGTERM to all processes in its process group; 7) this includes the top-level make process that is running the automake testsuite (if any); which explains the crash that is the object of this bug report. I'm not 100% positive that point (7) is completely correct, but I'm running out of time now, so I'll settle for this explanation; kudos to anyone who can give some confirmation about the correctness of point (7)! -*-*-*- Now, the right fix for the bug is *not* to work around this behaviour of the Korn shell; rather, we should fix the suspicious logic of the `parallel-tests3.test' script, which was also causing a testsuite hanging on FreeBSD. Patch coming up shortly. And it goes without saying that this horrendous NetBSD's Korn Shell incompatibility should be documented in the autoconf manual; I will maybe give it a shot in the next days if nobody beats me. Regards, Stefano