Stephen Hemminger wrote: > On Tue, 27 Nov 2007 14:34:44 -0800 > "Kok, Auke" <[EMAIL PROTECTED]> wrote: > >> Stephen Hemminger wrote: >>> On Tue, 27 Nov 2007 19:52:24 +0100 >>> Robert Olsson <[EMAIL PROTECTED]> wrote: >>> >>>> Hello! >>>> >>>> I've discovered a bug while testing the new multiQ NAPI code. In hi-load >>>> situations when we take down an interface we get a kernel panic. The >>>> oops is below. >>>> >>>> From what I see this happens when driver does napi_disable() and clears >>>> NAPI_STATE_SCHED. In net_rx_action there is a check for work == weight >>>> a sort indirect test but that's now not enough to cover the load >>>> situation. >>>> where we have NAPI_STATE_SCHED cleared by e1000_down in my case and still >>>> full quota. Latest git but I'll guess the is the same in all later kernels. >>>> There might be different solutions... one variant is below: >>> It is considered a driver bug in 2.6.24 to call netif_rx_complete (clear >>> NAPI_STATE_SCHED) >>> and do a full quota. That bug already had to be fixed in other drivers, >>> look like e1000 has same problem. >> Stephen, >> >> please enlighten me, can you e.g. show me a commit of other drivers where you >> fixed this up? >> >> Thanks, >> >> Auke > > Author: David S. Miller <[EMAIL PROTECTED]> 2007-10-11 18:08:29 > Committer: David S. Miller <[EMAIL PROTECTED]> 2007-10-11 18:08:29 > Parent: b9f2c0440d806e01968c3ed4def930a43be248ad ([netdrvr] Stop using legacy > hooks ->self_test_count, ->get_stats_count) > Child: 266918303226cceac7eca38ced30f15f277bd89c ([SKY2]: status polling loop > (post merge)) > Branches: master, origin > Follows: v2.6.23 > Precedes: v2.6.24-rc1 > > [NET]: Fix NAPI completion handling in some drivers. > > In order for the list handling in net_rx_action() to be > correct, drivers must follow certain rules as stated by > this comment in net_rx_action(): > > /* Drivers must not modify the NAPI state if they > * consume the entire weight. In such cases this code > * still "owns" the NAPI instance and therefore can > * move the instance around on the list at-will. > */ > > A few drivers do not do this because they mix the budget checks > with reading hardware state, resulting in crashes like the one > reported by [EMAIL PROTECTED] > > BNX2 and TG3 are taken care of here, SKY2 fix is from Stephen > Hemminger. > > Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
OK, I'm not sure what went wrong there with e1000, but I'll send a patch in a second. Robert, please give that patch a try (it fixes a crash that I had here as well) and let us know if it works for you. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html