On Thu, 2015-07-30 at 07:42 +0200, Eric Dumazet wrote: > On Thu, 2015-07-30 at 01:41 +0000, Gregory Hoggarth wrote: > > Hi, > > > > My company has also started having what appears to be the same problem, > > since we upgraded our embedded system to > > linux kernel 3.16. > > > > I tried applying the suggested fix of READ_ONCE (and also had to add in the > > necessary code to compiler.h as 3.16 > > didn't have it) and unfortunately it did not fix the issue at all. > > > > Unfortunately we do not have an easy reproduction method, and do not know > > precisely what is going on in the system > > when the issue occurs. We know it is a multicast UDP packet but that is > > about it. For us, the crash happens during > > a critical stage in our system initialisation, making additional debugging > > and instrumentation difficult. Our > > reproduction rate is approximately 1 out of 100 test runs; testing > > overnight we will usually see 3-5 instances of > > the crash happening. All our attempts to increase the reproduction rate, or > > reproduce the issue in a simpler/more > > controlled way have failed. > > > > Because we have customised the linux kernel, in some places radically, we > > assumed this was just a problem only we > > were seeing, so we were trying to fix it ourselves. Now that this appears > > to be a generic problem upstream, we've > > simply disabled UDP early demux in our system (since it's a new > > optimisation that we have lived without up till > > now) and will wait for this issue to be fixed upstream instead. > > > > > > So I'm sharing the debug patch I've written to help gather data on what is > > going on in the system, and some > > of the output we've gotten from the debug, in case this is useful for > > anyone else who is seeing this problem or > > would like to try and fix it. > > > > Feel free to ask questions, I'm not sure how much help I can be but will do > > my best. We'll be happy to assist in > > testing any proposed fixes. I also have some more examples of kernel oops > > and debug output if that could be useful, > > although the debug is from earlier iterations of the patch so that > > historical output is not as detailed as the > > output generated by the latest version of the patch attached here. > > > > Thanks, > > Greg Hoggarth > > CC UDP early demux author : Shawn Bohrer > > I believe this is a race condition with a dst escaping RCU protected > region. > > I will send a patch. >
Please try following fixes : diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..02baaa6d97b3 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1778,9 +1778,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, struct dst_entry *dst = skb_dst(skb); int ret; - if (unlikely(sk->sk_rx_dst != dst)) + if (unlikely(sk->sk_rx_dst != dst)) { + skb_dst_force(skb); udp_sk_rx_dst_set(sk, dst); - + } ret = udp_queue_rcv_skb(sk, skb); sock_put(sk); /* a return value > 0 means to resubmit the input, but @@ -1995,7 +1996,7 @@ void udp_v4_early_demux(struct sk_buff *skb) skb->sk = sk; skb->destructor = sock_efree; - dst = sk->sk_rx_dst; + dst = READ_ONCE(sk->sk_rx_dst); if (dst) dst = dst_check(dst, 0); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html