From: Jesse Brandeburg <[EMAIL PROTECTED]> Date: Fri, 14 Apr 2006 15:55:10 -0700 (Pacific Daylight Time)
> I'm trying to isolate more of a reproduction case, I'll be sure to > post if I can find anything with more detail. I think I see the bug. If tbench with large numbers of clients is part of what helps reproduce it, the key might be hitting the memory limits in tcp_mem[] and friends, or something to do with concurrent access to sk->sk_forward_alloc. I bet there is some race in there. A lot of the action is in net/core/stream.c We modify sk->sk_forward_alloc non-atomically but that should be ok since we ought to be holding all of the correct locks when we hit these accesses. But it is the first thing to audit. Let's look at sk_stream_rfree() as that is invoked from SKB freeing callbacks and is the most likely suspect for these kinds of problems. It is hooked up to the skb->destructor by sk_stream_set_owner_r() and then invoked via __kfree_skb(). Nothing here takes any locks, and as stated above we modify sk->sk_forward_alloc non-atomically, and this is therefore the bug. Shit. I'll think of how to fix this in the least invasive manner. I also want to search the changelog history to see if this race was always present or if it was "introduced". Making sk->sk_forward_alloc an atomic_t would be incredibly expensive so I'll try to find a way to avoid that. We may be able to just do a bh_lock_sock()/bh_unlock_sock() around the body of sk_stream_rfree() to fix this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html