On Fri, Feb 11, 2011 at 1:30 PM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> The other case is that the child process was quick and already exited.
> You get ^C, but the child never did. When you do the waitpid(), you'll
> never get the EINTR, because there was no actual wait.

Ok, so here's a suggested patch. It took much longer than expected,
because I found another independent race while trying to fix this:
bash did the "restore_sigint_handler()" _after_ having already tested
the "wait_sigint_received" value, so what could happen is that the
SIGINT would happen _after_ the test of that flag, but before the
signal handler was restored, so bash would (once more) drop the ^C on
the floor.

So there are two fixes here, and while I have by no means done
exhaustive testing, this patch seems to make the problem with dropped
^C really really hard for me to trigger.

I do believe it's still racy, it's just that I think it is much less
racy than it used to be.

And maybe I introduced some new bug. There may be a reason why the
sigint-handler was restored too late. But it not only passes my ^C
tests, it seems to pass "make test" too, so it can't be horribly
broken.

And bash _was_ horribly broken wrt ^C before.

I tried to make the "child_blocked_sigint" logic very trivial and
obvious, and I added comments for it. But if there are any questions
about the patch, feel free to holler.

The basic rule is:
 - we only think that a child blocked sigint if we actually get EINTR
from a system call (and the signal handler set wait_sigint_received)
 - if the child returns with WSIGNALLED and WTERMSIG()==SIGINT, then
we say "it clearly didn't block it after all, or it emulated dying
with SIGINT, so we clear that previous heuristic".
 - then, instead of checking how the child died, we just use the new
'child_blocked_sigint' flag instead.

So instead of testing

  wait_sigint_received && WIFSIGNALED (s) && WTERMSIG (s) == SIGINT

it now tests

  wait_sigint_received && !child_blocked_sigint

which seems to match the comments better anyway.

Long story short: maybe the patch is buggy. But the issue is real, and
the patch looks largely sensible to me.

                              Linus
 jobs.c |   35 +++++++++++++++++++++++++++--------
 1 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/jobs.c b/jobs.c
index df13ad9..8b9320f 100644
--- a/jobs.c
+++ b/jobs.c
@@ -2193,7 +2193,7 @@ restore_sigint_handler ()
     }
 }
 
-static int wait_sigint_received;
+static int wait_sigint_received, child_blocked_sigint;
 
 /* Handle SIGINT while we are waiting for children in a script to exit.
    The `wait' builtin should be interruptible, but all others should be
@@ -2365,6 +2365,7 @@ wait_for (pid)
 
   /* This is possibly a race condition -- should it go in stop_pipeline? */
   wait_sigint_received = 0;
+  child_blocked_sigint = 0;
   if (job_control == 0 || (subshell_environment&SUBSHELL_COMSUB))
     {
       old_sigint_handler = set_signal_handler (SIGINT, wait_sigint_handler);
@@ -2424,6 +2425,18 @@ wait_for (pid)
 	  sigaction (SIGCHLD, &oact, (struct sigaction *)NULL);
 	  sigprocmask (SIG_SETMASK, &chldset, (sigset_t *)NULL);
 #  endif
+	  /* If the waitchld returned EINTR, and the shell got a SIGINT,
+	     then the child has not died yet, and we assume that the
+	     child has blocked SIGINT. In that case, we require that the
+	     child return with WSIGTERM() == SIGINT to actually consider
+	     the ^C relevant. This is racy (the child may be in the
+	     process of exiting and bash reacted to the EINTR first),
+	     but this makes the race window much much smaller */
+	  if (r == -1 && errno == EINTR && wait_sigint_received)
+	    {
+	      child_blocked_sigint = 1;
+	    }
+
 	  queue_sigchld = 0;
 	  if (r == -1 && errno == ECHILD && this_shell_builtin == wait_builtin)
 	    {
@@ -2461,6 +2474,16 @@ wait_for (pid)
     }
   while (PRUNNING (child) || (job != NO_JOB && RUNNING (job)));
 
+  /* If the child explicitly exited with a SIGINT, we dismiss the
+     fact that we guessed it blocked sigint earlier */
+  if (WIFSIGNALED(child->status) && WTERMSIG(child->status) == SIGINT)
+    {
+      child_blocked_sigint = 0;
+    }
+
+  /* Restore the original SIGINT signal handler. */
+  restore_sigint_handler ();
+
   /* The exit state of the command is either the termination state of the
      child, or the termination state of the job.  If a job, the status
      of the last child in the pipeline is the significant one.  If the command
@@ -2556,10 +2579,9 @@ if (job == NO_JOB)
 	     pass the status back to our parent. */
 	  s = job_signal_status (job);
 	
-	  if (WIFSIGNALED (s) && WTERMSIG (s) == SIGINT && signal_is_trapped (SIGINT) == 0)
+	  if (!child_blocked_sigint && signal_is_trapped (SIGINT) == 0)
 	    {
 	      UNBLOCK_CHILD (oset);
-	      restore_sigint_handler ();
 	      old_sigint_handler = set_signal_handler (SIGINT, SIG_DFL);
 	      if (old_sigint_handler == SIG_IGN)
 		restore_sigint_handler ();
@@ -2585,9 +2607,6 @@ wait_for_return:
 
   UNBLOCK_CHILD (oset);
 
-  /* Restore the original SIGINT signal handler before we return. */
-  restore_sigint_handler ();
-
   return (termination_state);
 }
 
@@ -3277,7 +3296,7 @@ set_job_status_and_cleanup (job)
 	 does not exit due to SIGINT, run the trap handler but do not
 	 otherwise act as if we got the interrupt. */
       if (wait_sigint_received && interactive_shell == 0 &&
-	  WIFSIGNALED (child->status) == 0 && IS_FOREGROUND (job) &&
+	  child_blocked_sigint && IS_FOREGROUND (job) &&
 	  signal_is_trapped (SIGINT))
 	{
 	  int old_frozen;
@@ -3299,7 +3318,7 @@ set_job_status_and_cleanup (job)
 	 signals are sent to process groups) or via kill(2) to the foreground
 	 process by another process (or itself).  If the shell did receive the
 	 SIGINT, it needs to perform normal SIGINT processing. */
-      else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) &&
+      else if (wait_sigint_received && !child_blocked_sigint &&
 	      IS_FOREGROUND (job) && IS_JOBCONTROL (job) == 0)
 	{
 	  int old_frozen;

Reply via email to