Re: Soft lockup in e100 driver ?

Daniel Walker Tue, 09 Aug 2005 09:21:58 -0700

It looks like this might be an SMP race , it seem that both processors
are in e100_down(). There is a while loop in e100_clean_cbs() that
appears to have an unsafe looping condition .


It looks like cbs_avail might jump over params.cbs.count , then you
would have to wait for a rollover . Is this a PREEMPT_NONE kernel?


This patch may help, but it's not a complete fix.

--- linux-2.6.12.orig/drivers/net/e100.c        2005-08-05 16:45:59.000000000 
+0000
+++ linux-2.6.12/drivers/net/e100.c     2005-08-09 16:14:45.000000000 +0000
@@ -1393,7 +1393,7 @@ static inline int e100_tx_clean(struct n
 static void e100_clean_cbs(struct nic *nic)
 {
        if(nic->cbs) {
-               while(nic->cbs_avail != nic->params.cbs.count) {
+               while(nic->cbs_avail < nic->params.cbs.count) {
                        struct cb *cb = nic->cb_to_clean;
                        if(cb->skb) {
                                pci_unmap_single(nic->pdev,



On Tue, 2005-08-09 at 16:36 +0300, Matti Aarnio wrote:
> Running very recent Fedora Core Development kernel I can following
> soft-oops..   ( 2.6.12-1.1455_FC5smp )
> 
> 
> e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
> BUG: soft lockup detected on CPU#0!
> 
> Pid: 10743, comm:             ifconfig
> EIP: 0060:[<f88bf2f9>] CPU: 0
> EIP is at e100_clean_cbs+0x2f/0x12b [e100]
>  EFLAGS: 00000293    Not tainted  (2.6.12-1.1455_FC5smp)
> EAX: 495c7c2b EBX: 495c7c2b ECX: f6c311a0 EDX: 00000000
> ESI: 00000040 EDI: f6c30000 EBP: f71a4b20 DS: 007b ES: 007b
> CR0: 8005003b CR2: 0804a544 CR3: 01e9cd80 CR4: 000006f0
>  [<f88c0708>] e100_down+0x66/0x9a [e100]
>  [<f88c1623>] e100_close+0xa/0xd [e100]
>  [<c02b7adb>] dev_close+0x40/0x7e
>  [<c02b8f59>] dev_change_flags+0x46/0xf5
>  [<c02f76b3>] devinet_ioctl+0x564/0x5df
>  [<c02af22c>] sock_ioctl+0xc3/0x250
>  [<c02af169>] sock_ioctl+0x0/0x250
>  [<c01762ef>] do_ioctl+0x1f/0x6d
>  [<c017648f>] vfs_ioctl+0x50/0x1c6
>  [<c0176662>] sys_ioctl+0x5d/0x6f
>  [<c010394d>] syscall_call+0x7/0xb
>  [<c014473f>] softlockup_tick+0x6f/0x80
>  [<c01085b8>] timer_interrupt+0x2d/0x75
>  [<c01448dd>] handle_IRQ_event+0x2e/0x5a
>  [<c01449cb>] __do_IRQ+0xc2/0x127
>  [<c0105f7e>] do_IRQ+0x4e/0x86
>  =======================
>  [<c01160cc>] smp_apic_timer_interrupt+0xc1/0xca
>  [<c0104382>] common_interrupt+0x1a/0x20
>  [<f88bf2f9>] e100_clean_cbs+0x2f/0x12b [e100]
>  [<f88c0708>] e100_down+0x66/0x9a [e100]
>  [<f88c1623>] e100_close+0xa/0xd [e100]
>  [<c02b7adb>] dev_close+0x40/0x7e
>  [<c02b8f59>] dev_change_flags+0x46/0xf5
>  [<c02f76b3>] devinet_ioctl+0x564/0x5df
>  [<c02af22c>] sock_ioctl+0xc3/0x250
>  [<c02af169>] sock_ioctl+0x0/0x250
>  [<c01762ef>] do_ioctl+0x1f/0x6d
>  [<c017648f>] vfs_ioctl+0x50/0x1c6
>  [<c0176662>] sys_ioctl+0x5d/0x6f
>  [<c010394d>] syscall_call+0x7/0xb
> 
> 
> 
> Preconditions for this are:
> 
> - E100 card stopped working for some reason (no idea why, it just
>   does sometimes at this oldish 2x P-III machine)
> - There are active datastreams running in and out
>   (around 0.2 Mbps out, multiple megabits in.)
> - Commanding then "ifconfig eth0 down" results in what feels like 
>   system freezing, but it does recover in about 30-60 seconds
>   (it takes long enough for me to sweat bullets...)
> - While in freeze state, keyboard can go crazy, but mouse does
>   respond, as well as tvtime shows bt848 captured live video.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft lockup in e100 driver ?

Reply via email to