Hi Dave, After applying Roland's NAPI patch, system panics when I run multiple thread iperf (no stack trace at this time, it shows that the panic is in net_tx_action).
I think the problem is: In the "done < budget" case, ipoib_poll calls netif_rx_complete() netif_rx_complete() __netif_rx_complete() __napi_complete() list_del() __list_del() entry->next = LIST_POISON1; entry->prev = LIST_POISON2; Due to race with another completion (explained at end of the patch), net_rx_action finds quota==0 (even though done < budget earlier): net_rx_action() if (unlikely(!n->quota)) { n->quota = n->weight; list_move_tail() __list_del(POISON, POISON) <PANIC> } while IPoIB calling netif_rx_reschedule() works fine due to: netif_rx_reschedule __netif_rx_schedule __napi_schedule list_add_tail (this is OK) Patch that fixes this: diff -ruNp a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h 2007-09-19 16:50:35.000000000 +0530 +++ b/include/linux/netdevice.h 2007-09-19 16:51:28.000000000 +0530 @@ -346,7 +346,7 @@ static inline void napi_schedule(struct static inline void __napi_complete(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); - list_del(&n->poll_list); + __list_del(&n->poll_list); smp_mb__before_clear_bit(); clear_bit(NAPI_STATE_SCHED, &n->state); } When I apply this patch, things work fine but I get napi_check_quota_bug() warning. This race seems to happen as follows: CPU#1: ipoib_poll(budget=100) { A. process 100 skbs B. netif_rx_complete() <Process unrelated interrupts; executes slower than steps C, D, E on CPU#2> F. ib_req_notify_cq() (no missed completions, do nothing) G. return 100 H. return to net_rx_action, quota=99, subtract 100, quota=-1, BUG. } CPU#2: ipoib_ib_completion() : (starts and finishes entire line of execution *after* step B and *before* H executes). { C. New skb comes, call netif_rx_schedule; set quota=100 D. do ipoib_poll(), process one skb, return work=1 to net_rx_action E. net_rx_action: set quota=99 } The reason why both cpu's can execute poll simultaneously is because netpoll_poll_lock() returns NULL (dev->npinfo == NULL). This results in negative napi refcount and the warning. I verified this is the reason by saving the original quota before calling poll (in net_tx_action) and comparing with final after poll (before it gets updated), and it gets changed very often in multiple thread testing (atleast 4 threads, haven't seen with 2). In most cases, the quota becomes -1, and I have seen upto -9 but those are rarer. Note: during steps F-H and C-E, priv/napi is read/modified by both cpu's which is another bug relating to the same race. I guess the above patch is not required if this bug (in IPoIB) is fixed? Roland, why cannot we get rid of "poll_more" ? We will get called again after netif_rx_reschedule, and it is cleaner to let the new execution handle fresh completions. Is there a reason why this goto is required? Thanks, - KK - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html