2011/10/12 Mikolaj Golub wrote: > > On Wed, 12 Oct 2011 09:53:34 +0800 dave jones wrote: > > dj> On Fri, Oct 7, 2011 at 9:12 AM, dave jones wrote: > >> 2011/10/4 Mikolaj Golub : > >>> > >>> On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote: > >>> > >>> dj> On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote: > >>> >> > >>> >> On Wed, 28 Sep 2011, Mikolaj Golub wrote: > >>> >> > >>> >>> On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote: > >>> >>> > >>> >>> KM> Sorry, didn't look at the images (limited bw), I've seen > something KM> > >>> >>> like this before in timewait. This "can't happen" with UDP so will > be KM> > >>> >>> interested in learning more about the bug. > >>> >>> > >>> >>> The panic can be easily triggered by this: > >>> >> > >>> >> Hi: > >>> >> > >>> >> Just catching up on this thread. I think the analysis here is > generally > >>> >> right: in 9.0, you're much more likely to see an inpcb with its > in_socket > >>> >> pointer cleared in the hash list than in prior releases, and > >>> >> in_pcbbind_setup() trips over this. > >>> >> > >>> >> However, at least on first glance (and from the perspective of > invariants > >>> >> here), I think the bug is actualy that in_pcbbind_setup() is asking > >>> >> in_pcblookup_local() for an inpcb and then access the returned > inpcb's > >>> >> in_socket pointer without acquiring a lock on the inpcb. > Structurally, it > >>> >> can't acquire this lock for lock order reasons -- it already holds > the lock > >>> >> on its own inpcb. Therefore, we should only access fields that are > safe to > >>> >> follow in an inpcb when you hold a reference via the hash lock and > not a > >>> >> lock on the inpcb itself, which appears generally OK (+/-) for all > the > >>> >> fields in that clause but the t->inp_socket->so_options dereference. > >>> >> > >>> >> A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer > field, > >>> >> such as inp_flags2, giving us access to its value without having to > walk > >>> >> into the attached (or not) socket. > >>> >> > >>> >> This raises another structural question, which is whether we need a > new > >>> >> inp_foo flags field that is protected explicitly by the hash lock, > and not > >>> >> by the inpcb lock, which could hold fields relevant to address > binding. I > >>> >> don't think we need to solve that problem in this context, as a > slightly > >>> >> race on SO_REUSEPORT is likely acceptable. > >>> >> > >>> >> The suggested fix does perform the desired function of explicitly > detaching > >>> >> the inpcb from the hash list before the socket is disconnected from > the > >>> >> inpcb. However, it's incomplete in that the invariant that's being > broken is > >>> >> also relied on for other protocols (such as raw sockets). The > correct > >>> >> invariant is that inp_socket is safe to follow unconditionally if an > inpcb > >>> >> is locked and INP_DROPPED isn't set -- the bug is in "locked" not in > >>> >> "INP_DROPPED", which is why I think this is the wrong fix, even > though it > >>> >> prevents a panic :-). > >>> > >>> dj> Hello Robert, > >>> > >>> dj> Thank you for taking your valuable time to find out the problem. > >>> dj> Since I don't have idea about network internals, would you have a > patch > >>> dj> about this? I'd be glad to test it, thanks again. > >>> > >>> Here is the patch that implements what Robert suggests. > >>> > >>> Dave, could you test it? > >> > >> Sure. Thanks for cooking the patch. > >> Machines have been running two days now without panic. > > Thank you for testing it. > > dj> Is there any plan to commit your fix? Thank you. > dj> I'd upgrade to 9.0-release from beta-2 once it's released. > > I have an upgraded version of the patch, which is under review now. I have > been waiting for the response before asking you to test it, but it would be > great if you try it not waiting :-). > > As it was pointed by Robert the previous version introduced a regression: > SO_REUSEPORT was ignored if setsockopt was called after bind (the old cached > value was still used). So the updated version fixes this and also contains > several other fixes, the most important among them is that it fixes the panic > for IPv6 bind case too.
Thanks for cooking the patch. Machines have been running up days without any panic. > -- > Mikolaj Golub Regards, Dave. _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"