Keith Owens wrote:
> 
> On Mon, 13 Nov 2000 09:58:17 +0100,
> Jasper Spaans <[EMAIL PROTECTED]> wrote:
> >All right, here's another one, this time using the oops directly from the
> >console -- this seems to give better symbols.. The 'console shuts up ...'
> >works, the oops from the other CPU didn't get put out.
> 
> Ohhhh, damn!  For NMI lockups we want the console to stay live so NMI
> detection on the other cpus can be printed.  NMI is normally caused by
> spinlock problems and it is useful to know what the other cpus are
> doing.  Andrew, do you want to have a go at fixing this?

Uh, sure - I just _love_ running fsck :)  I'm working on this stuff
at present.  That wake_up in printk() is baaaaad...

> >Will try test11-pre3 + kdb this afternoon, if it compiles.
> 
> Patch kdb-v1.5-2.4.0-test11-pre3.gz should be OK.

It would be very, very interesting to see where the other CPU is.

I can see one bug from Jasper's trace: setscheduler() does:

        spin_lock_irq(&runqueue_lock);
        read_lock(&tasklist_lock);

whereas the exit_notify->do_notify_parent->send_sig_info->wake_up_process
path does:

        write_lock_irq(&tasklist_lock);
        spin_lock_irqsave(&runqueue_lock, flags);

Death by double deadlock.  But I doubt if setscheduler() is the
source - who ever calls that?

The correct locking hierarchy is, I think:

        spin_lock(runqueue_lock)
        read/write_lock(tasklist_lock)
        read/write_unlock(tasklist_lock)
        spin_unlock(runqueue_lock)

Jasper, as a random stab in the dark you may care to try this:

--- linux-2.4.0-test11-pre4/kernel/exit.c       Sun Oct 15 01:27:46 2000
+++ linux-akpm/kernel/exit.c    Mon Nov 13 22:05:37 2000
@@ -381,8 +381,10 @@
         *      jobs, send them a SIGHUP and then a SIGCONT.  (POSIX 3.2.2.2)
         */
 
-       write_lock_irq(&tasklist_lock);
+       read_lock_irq(&tasklist_lock);
        do_notify_parent(current, current->exit_signal);
+       read_unlock_irq(&tasklist_lock);
+       write_lock_irq(&tasklist_lock);
        while (current->p_cptr != NULL) {
                p = current->p_cptr;
                current->p_cptr = p->p_osptr;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to