This bug is quite subtle and only happens in a very interesting
situation where a real-time threaded process is in the middle of a
coredump when someone whacks it with a SIGKILL. However, this deadlock
leaves the system pretty hosed and you have to reboot to recover.

Not good for real-time priority-preemption applications like our
telephony application, with 90+ real-time (SCHED_FIFO and SCHED_RR)
processes, many of them multi-threaded, interacting with each other for
high volume call processing.

- Bhavesh

Also, for your reading pleasure, a complete analysis of how the system
gets into a deadlock due to this bug. I wanted to post it because I
spent several hours analysing this.

-- 
Bhavesh P. Davda | Distinguished Member of Technical Staff | Avaya |
1300 West 120th Avenue | B3-B03 | Westminster, CO 80234 | U.S.A. |
Voice/Fax: 303.538.4438 | [EMAIL PROTECTED]
diff -Naur linux-2.6.12.5/kernel/signal.c linux-2.6.12.5-sigfix/kernel/signal.c
--- linux-2.6.12.5/kernel/signal.c	2005-08-14 18:20:18.000000000 -0600
+++ linux-2.6.12.5-sigfix/kernel/signal.c	2005-08-17 11:36:20.547600092 -0600
@@ -686,7 +686,7 @@
 {
 	struct task_struct *t;
 
-	if (p->flags & SIGNAL_GROUP_EXIT)
+	if (p->signal->flags & SIGNAL_GROUP_EXIT)
 		/*
 		 * The process is in the middle of dying already.
 		 */
When bash sends SIGABRT to rt-pthreaded-app main thread:

bash: sys_kill(pid, SIGABRT)
      kill_something_info(SIGABRT, &info, pid)
      kill_proc_info(SIGABRT, info, pid)
      p = find_task_by_pid(pid), group_send_sig_info(SIGABRT, info, p)
      __group_send_sig_info(SIGABRT, info, p)
      __group_complete_signal(SIGABRT, p)
Still bash, p==rt-pthreaded-app main thread:

static void __group_complete_signal(int sig, struct task_struct *p)
{
        unsigned int mask;
        struct task_struct *t;

        /*
         * Don't bother traced and stopped tasks (but
         * SIGKILL will punch through that).
         */
        mask = TASK_STOPPED | TASK_TRACED;
        if (sig == SIGKILL)
                mask = 0;

==> mask == TASK_STOPPED|TASK_TRACED
        /*
         * Now find a thread we can wake up to take the signal off the queue.
         *
         * If the main thread wants the signal, it gets first crack.
         * Probably the least surprising to the average bear.
         */
        if (wants_signal(sig, p, mask))
                t = p;
==> t = p (rt-pthreaded-app main thread)
        else if (thread_group_empty(p))
                /*
                 * There is just one thread and it does not need to be woken.
                 * It will dequeue unblocked signals before it runs again.
                 */
                return;
        else {
                /*
                 * Otherwise try to find a suitable thread.
                 */
                t = p->signal->curr_target;
                if (t == NULL)
                        /* restart balancing at this thread */
                        t = p->signal->curr_target = p;
                BUG_ON(t->tgid != p->tgid);

                while (!wants_signal(sig, t, mask)) {
                        t = next_thread(t);
                        if (t == p->signal->curr_target)
                                /*
                                 * No thread needs to be woken.
                                 * Any eligible threads will see
                                 * the signal in the queue soon.
                                 */
                                return;
                }
                p->signal->curr_target = t;
        }

        /*
         * Found a killable thread.  If the signal will be fatal,
         * then start taking the whole group down immediately.
         */
        if (sig_fatal(p, sig) && !(p->signal->flags & SIGNAL_GROUP_EXIT) &&
            !sigismember(&t->real_blocked, sig) &&
            (sig == SIGKILL || !(t->ptrace & PT_PTRACED))) {
==> sig_fatal(p, SIGABRT) true
==> SIGNAL_GROUP_EXIT is not set yet
==> SIGABRT is not blocked
==> p is not PT_PTRACED
                /*
                 * This signal will be fatal to the whole group.
                 */
                if (!sig_kernel_coredump(sig)) {
==> SIGABRT is sig_kernel_coredump(), skip
                        /*
                         * Start a group exit and wake everybody up.
                         * This way we don't have other threads
                         * running and doing things after a slower
                         * thread has the fatal signal pending.
                         */
                        p->signal->flags = SIGNAL_GROUP_EXIT;
                        p->signal->group_exit_code = sig;
                        p->signal->group_stop_count = 0;
                        t = p;
                        do {
                                sigaddset(&t->pending.signal, SIGKILL);
                                signal_wake_up(t, 1);
                                t = next_thread(t);
                        } while (t != p);
                        return;
                }

                /*
                 * There will be a core dump.  We make all threads other
                 * than the chosen one go into a group stop so that nothing
                 * happens until it gets scheduled, takes the signal off
                 * the shared queue, and does the core dump.  This is a
                 * little more complicated than strictly necessary, but it
                 * keeps the signal state that winds up in the core dump
                 * unchanged from the death state, e.g. which thread had
                 * the core-dump signal unblocked.
                 */
                rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
                rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
                p->signal->group_stop_count = 0;
                p->signal->group_exit_task = t;
                t = p;
==> Start with thread being killed
                do {
                        p->signal->group_stop_count++;
==> For rt-pthreaded-app this will be done twice (for the 2 subthreads)
                        signal_wake_up(t, 0);
==> This is a no-op so far, because the subthread "t" doesn't have a signal
                        t = next_thread(t);
                } while (t != p);
                wake_up_process(p->signal->group_exit_task);
==> This wakes up the main rt-pthreaded-app thread. At this point in time,
==> group_stop_count == 2, but SIGNAL_GROUP_EXIT is still not set
                return;
==> BASH IS DONE.
        }

        /*
         * The signal is already in the shared-pending queue.
         * Tell the chosen thread to wake up and dequeue it.
         */
        signal_wake_up(t, sig == SIGKILL);
        return;
}


rt-pthreaded-app main thread:
======================
Coming out of schedule(), it will look for pending signals

do_notify_resume()
do_signal()
        signr = get_signal_to_deliver(&info, &ka, regs, NULL);
get_signal_to_deliver()
        if (unlikely(current->signal->group_stop_count > 0) &&
                handle_group_stop())
==> group_stop_count is 2, so call handle_group_stop()
handle_group_stop()
        if (current->signal->group_exit_task == current) {
==> This is true
                /* Group stop is so we can do a core dump,
                 * We are the initiating thread, so get on with it. */
                current->signal->group_exit_task = NULL;
                return 0;
        }
==> back to get_signal_to_deliver()
                signr = dequeue_signal(current, mask, info);
==> signr == SIGABRT
        if (!signr) break; /* will return 0 */ (not true, signr==SIGABRT)
        if ((current->ptrace & PT_PTRACED) && signr != SIGKILL) {
        (not true, skip)
        ka = &current->sighand->action[signr-1];
        if (ka->sa.sa_handler == SIG_IGN) /* Do nothing.  */
                continue; (not true, handler == SIG_DFL)
        if (ka->sa.sa_handler != SIG_DFL) {
        (not true, skip)
        if (sig_kernel_ignore(signr)) /* Default is nothing. */ continue;
        (not true, skip)
        if (current->pid == 1) continue; (not true, skip)
        if (sig_kernel_stop(signr)) { (not true, skip)
        /* Anything else is fatal, maybe with a core dump. */
        current->flags |= PF_SIGNALED;
        if (sig_kernel_coredump(signr)) {
==> TRUE
                do_coredump((long)signr, signr, regs);

do_coredump(SIGABRT, SIGABRT, regs)
        current->signal->flags = SIGNAL_GROUP_EXIT;
==> Finally we set SIGNAL_GROUP_EXIT here
        current->signal->group_exit_code = exit_code;
==> group_exit_code == SIGABRT
        coredump_wait(mm);

coredump_wait(mm)
        mm->core_waiters++; /* let other threads block */
        /* give other threads a chance to run: */
        yield();
        zap_threads(mm);


zap_threads(mm)
        do_each_thread(g,p)
                if (mm == p->mm && p != tsk) {
                        force_sig_specific(SIGKILL, p);
==> This is where the rt-pthreaded-app subthreads are sent a SIGKILL

force_sig_specific(SIGKILL, p)
        specific_send_sig_info(SIGKILL, (void *)2, t);
specific_send_sig_info(SIGKILL, 2, t)
        ret = send_signal(SIGKILL, 2, t, &t->pending);
send_signal(SIGKILL, 2, t, &t->pending)
        /*
         * fast-pathed signals for kernel-internal things like SIGSTOP
         * or SIGKILL.
         */
        if ((unsigned long)info == 2) goto out_set;
        (True)
        sigaddset(&signals->signal, sig);
        return ret; // returns 0
Back to specific_send_sig_info(SIGKILL, 2, t)
        if (!ret && !sigismember(&t->blocked, sig))
                signal_wake_up(t, sig == SIGKILL);
        (True)
signal_wake_up(t, TRUE)
        set_tsk_thread_flag(t, TIF_SIGPENDING);
        mask = TASK_INTERRUPTIBLE;
        if (resume) (True)
                mask |= TASK_STOPPED | TASK_TRACED;
        if (!wake_up_state(t, mask))
                kick_process(t)
==> This will wake up rt-pthreaded-app subthreads whether they are in
==> TASK_INTERRUPTIBLE, TASK_STOPPED, or TASK_TRACED states
==> THIS WON'T WAKE UP TASK_UNINTERRUPTIBLE THREADS

==> At this point in time:
==> group_stop_count == 2, SIGNAL_GROUP_EXIT is set in all threads
                        mm->core_waiters++;
==> This finally becomes 3 (main + 2 subthreads)
                }
        while_each_thread(g,p);

Back to coredump_wait()
        if (--mm->core_waiters) {
==> Main thread decrements core_waiters back to 2.
                up_write(&mm->mmap_sem);
                wait_for_completion(&startup_done);


NOW, IF THE MAIN rt-pthreaded-app THREAD IS SENT A SIGKILL WHILE WAITING

handle_stop_signal()
        if (p->flags & SIGNAL_GROUP_EXIT) return;
***** WRONG CHECK! SHOULD BE (p->signal->flags & SIGNAL_GROUP_EXIT) *****
        else if (sig == SIGKILL) {
                p->signal->flags = 0;
        }
********* WHOOPS! Just cleared SIGNAL_GROUP_EXIT **************

rt-pthreaded-app subthread:
====================
Coming out of schedule(), it will look for pending signals

do_notify_resume()
do_signal()
        signr = get_signal_to_deliver(&info, &ka, regs, NULL);
get_signal_to_deliver()
        if (unlikely(current->signal->group_stop_count > 0) &&
                handle_group_stop())
==> group_stop_count is 2, so call handle_group_stop()
handle_group_stop()
        if (current->signal->group_exit_task == current) {
        (False)
        if (current->signal->flags & SIGNAL_GROUP_EXIT) return;
        (SHOULD HAVE BEEN TRUE, BUT WAS CLEARED BY MAIN THREAD)
        stop_count = --current->signal->group_stop_count;
==> group_stop_count is now 1
        if (stop_count == 0)
                current->signal->flags = SIGNAL_STOP_STOPPED;
        current->exit_code = current->signal->group_exit_code;
==> exit_code == SIGABRT
        set_current_state(TASK_STOPPED);
==> Task enters TASK_STOPPED state
        finish_stop(stop_count);

DEADLOCK!

Reply via email to