On Thu, 2015-07-30 at 01:41 +0000, Gregory Hoggarth wrote: > Hi, > > My company has also started having what appears to be the same problem, since > we upgraded our embedded system to > linux kernel 3.16. > > I tried applying the suggested fix of READ_ONCE (and also had to add in the > necessary code to compiler.h as 3.16 > didn't have it) and unfortunately it did not fix the issue at all. > > Unfortunately we do not have an easy reproduction method, and do not know > precisely what is going on in the system > when the issue occurs. We know it is a multicast UDP packet but that is about > it. For us, the crash happens during > a critical stage in our system initialisation, making additional debugging > and instrumentation difficult. Our > reproduction rate is approximately 1 out of 100 test runs; testing overnight > we will usually see 3-5 instances of > the crash happening. All our attempts to increase the reproduction rate, or > reproduce the issue in a simpler/more > controlled way have failed. > > Because we have customised the linux kernel, in some places radically, we > assumed this was just a problem only we > were seeing, so we were trying to fix it ourselves. Now that this appears to > be a generic problem upstream, we've > simply disabled UDP early demux in our system (since it's a new optimisation > that we have lived without up till > now) and will wait for this issue to be fixed upstream instead. > > > So I'm sharing the debug patch I've written to help gather data on what is > going on in the system, and some > of the output we've gotten from the debug, in case this is useful for anyone > else who is seeing this problem or > would like to try and fix it. > > Feel free to ask questions, I'm not sure how much help I can be but will do > my best. We'll be happy to assist in > testing any proposed fixes. I also have some more examples of kernel oops and > debug output if that could be useful, > although the debug is from earlier iterations of the patch so that historical > output is not as detailed as the > output generated by the latest version of the patch attached here. > > Thanks, > Greg Hoggarth
CC UDP early demux author : Shawn Bohrer I believe this is a race condition with a dst escaping RCU protected region. I will send a patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html