Brian Gerst wrote:
> 
> Mircea Damian wrote:
> >
> > Hello,
> >
> > I was expecting to receive some replies to my last desperate messages:
> > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg35446.html
> > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg36591.html
> >
> > My machine is dyeing in add_timer(). It seems to happen only on SMP
> > machines and is something related to the network driver. For some reason
> >  one of the timer lists gets broken so we (we are two people trying to
> >  solve this issue) wrote a "safe" timer.c which tries to rebuild the chain
> >  in case it hits a NULL pointer.
> >
> > The machine is (ofcourse) slower with this patch but at least it works.
> >
> > Maybe someone can see which is the real bug and fix it.
> >
> > Please help!
> 
> I found (at least part of) the problem.  In detach_timer() we test if
> the timer is pending.  If it is not the function does not remove the
> timer from the list and returns 0.  The functions that call
> detach_timer() do not check the return value and unconditionally set the
> list pointers to NULL, even though the timer is still on the list.
> Patch against 2.4.3 attached, but there may be a better solution.
> 
uh, but the pending test is to check for NULL pointers, so while your
change consolidates some code, I don't think it changes anything, unless
you can find a place that calls detach_timer and doesn't clear the
pointers...

For what its worth, my look at this problem seems to indicate that the
new timer pointer is zero.  This would be a problem in the network code
somewhere.  I would guess that the whole structure is being released by
cpu X while cpu y is trying to set up a timer.  But then I don't really
know the network code (at all).  Just going by the error, defer of zero
in add_timer() which only looks at the timer to verify that the pointers
are zero.

George


George


> --
> 
>                                                 Brian Gerst
> 
>   ------------------------------------------------------------------------
> diff -urN linux-2.4.3/kernel/timer.c linux/kernel/timer.c
> --- linux-2.4.3/kernel/timer.c  Thu Dec 14 20:52:22 2000
> +++ linux/kernel/timer.c        Fri Apr 13 13:26:08 2001
> @@ -194,6 +194,7 @@
>         if (!timer_pending(timer))
>                 return 0;
>         list_del(&timer->list);
> +       timer->list.next = timer->list.prev = NULL;
>         return 1;
>  }
> 
> @@ -217,7 +218,6 @@
> 
>         spin_lock_irqsave(&timerlist_lock, flags);
>         ret = detach_timer(timer);
> -       timer->list.next = timer->list.prev = NULL;
>         spin_unlock_irqrestore(&timerlist_lock, flags);
>         return ret;
>  }
> @@ -246,7 +246,6 @@
> 
>                 spin_lock_irqsave(&timerlist_lock, flags);
>                 ret += detach_timer(timer);
> -               timer->list.next = timer->list.prev = 0;
>                 running = timer_is_running(timer);
>                 spin_unlock_irqrestore(&timerlist_lock, flags);
> 
> @@ -309,7 +308,6 @@
>                         data= timer->data;
> 
>                         detach_timer(timer);
> -                       timer->list.next = timer->list.prev = NULL;
>                         timer_enter(timer);
>                         spin_unlock_irq(&timerlist_lock);
>                         fn(data);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to