On Wed, 2016-02-10 at 12:43 +0000, Vaneet Narang wrote:
> Hi,
> 
> >What driver are you using (is that in-tree)? Can you reproduce the same issue
> >with a latest -net kernel, for example (or, a 'reasonably' recent one like 
> >4.3 or
> >4.4)? There has been quite a bit of changes in err queue handling (which also
> >accounts rmem) as well. How reliably can you trigger the issue? Does it 
> >trigger
> >with a completely different in-tree network driver as well with your tests? 
> >Would
> >be useful to track/debug sk_rmem_alloc increases/decreases to see from which 
> >path
> >new rmem is being charged in the time between packet_release() and 
> >packet_sock_destruct()
> >for that socket ...
> >
> It seems race condition to us between packet_rcv and packet_close, we have 
> tried to reproduce
> this issue by adding delay in skb_set_owner_r and issue gets reproduced quite 
> frequently. 
> we have added some traces and on analyzing we have realised following 
> possible race condition.



Even if you add a delay in skb_set_owner_r(), this should not allow the
dismantle phase to complete, since at least one cpu is still in a
rcu_read_lock() section.

synchronize_rcu() must complete only when all cpus pass an rcu quiescent
point.

packet_close() should certainly not be called while another cpu is still
in the middle of packet_rcv()

Your patch does not address the root cause.


Reply via email to