> On Dec 8, 2016, at 7:32 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > >> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: >> >> We can reproduce the problem at will, still trying to run down the >> problem. I'll try and find one of the boxes that dumped a core and get >> a bt of everybody. Thanks, > > OK, sounds good. > > I had a look and : > - could not spot a fix that came after 4.6. > - could not spot an obvious bug. > > Anything special in the program triggering the issue ? > SO_REUSEPORT and/or special socket options ? >
So they recently started using SO_REUSEPORT, that's what triggered it, if they don't use it then everything is fine. I added some instrumentation for get_port to see if it was looping in there and none of my printk's triggered. The softlockup messages are always on the inet_bind_bucket lock, sometimes in the process context in get_port or in the softirq context either through inet_put_port or inet_kill_twsk. On the box that I have a coredump for there's only one processor in the inet code so I'm not sure what to make of that. That was a box from last week so I'll look at a more recent core and see if it's different. Thanks, Josef