Re: Soft lockup in inet_put_port on 4.6

2016-12-19 Thread Josef Bacik
> On Dec 19, 2016, at 11:52 PM, Eric Dumazet wrote: > > On Tue, 2016-12-20 at 03:40 +, Josef Bacik wrote: >>> On Dec 19, 2016, at 9:42 PM, Eric Dumazet wrote: >>> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: When sockets created SO_REUSEPORT move to TW state they a

Re: Soft lockup in inet_put_port on 4.6

2016-12-19 Thread Eric Dumazet
On Tue, 2016-12-20 at 03:40 +, Josef Bacik wrote: > > On Dec 19, 2016, at 9:42 PM, Eric Dumazet wrote: > > > >> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: > >> > >> When sockets created SO_REUSEPORT move to TW state they are placed > >> back on the the tb->owners. fastreuse port i

Re: Soft lockup in inet_put_port on 4.6

2016-12-19 Thread Josef Bacik
> On Dec 19, 2016, at 9:42 PM, Eric Dumazet wrote: > >> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: >> >> When sockets created SO_REUSEPORT move to TW state they are placed >> back on the the tb->owners. fastreuse port is no longer set so we have >> to walk potential long list of sock

Re: Soft lockup in inet_put_port on 4.6

2016-12-19 Thread Eric Dumazet
On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote: > When sockets created SO_REUSEPORT move to TW state they are placed > back on the the tb->owners. fastreuse port is no longer set so we have > to walk potential long list of sockets in tb->owners to open a new > listener socket. I imagine this

Re: Soft lockup in inet_put_port on 4.6

2016-12-19 Thread Tom Herbert
On Mon, Dec 19, 2016 at 5:56 PM, David Miller wrote: > From: Josef Bacik > Date: Sat, 17 Dec 2016 13:26:00 + > >> So take my current duct tape fix and augment it with more >> information in the bind bucket? I'm not sure how to make this work >> without at least having a list of the binded ad

Re: Soft lockup in inet_put_port on 4.6

2016-12-19 Thread David Miller
From: Josef Bacik Date: Sat, 17 Dec 2016 13:26:00 + > So take my current duct tape fix and augment it with more > information in the bind bucket? I'm not sure how to make this work > without at least having a list of the binded addrs as well to make > sure we are really ok. I suppose we cou

Re: Soft lockup in inet_put_port on 4.6

2016-12-17 Thread Josef Bacik
> On Dec 17, 2016, at 6:09 AM, Hannes Frederic Sowa > wrote: > >> On 16.12.2016 23:50, Josef Bacik wrote: >>> On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbert wrote: On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: > On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: > >>

Re: Soft lockup in inet_put_port on 4.6

2016-12-17 Thread Hannes Frederic Sowa
On 16.12.2016 23:50, Josef Bacik wrote: > On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbert wrote: >> On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: >>> On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: > > On Thu, Dec

Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Josef Bacik
On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbert wrote: On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef,

Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik wrote: > On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: >> >> On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: >>> >>> On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa >>> wrote: Hi Josef, On 15.12.2016 19:53, Josef

Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Josef Bacik
On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik wrote: On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 201

Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Josef Bacik
On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik wrote: On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13,

Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Josef Bacik
On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa wrote: Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think the

Re: Soft lockup in inet_put_port on 4.6

2016-12-15 Thread Hannes Frederic Sowa
Hi Josef, On 15.12.2016 19:53, Josef Bacik wrote: > On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: >> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >> wrote: >>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert >>> wrote: I think there may be some suspicious code in inet_csk_get_port. A

Re: Soft lockup in inet_put_port on 4.6

2016-12-15 Thread Craig Gallek
On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbert wrote: > On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik wrote: >> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: >>> >>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >>> wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote:

Re: Soft lockup in inet_put_port on 4.6

2016-12-15 Thread Tom Herbert
On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik wrote: > On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: >> >> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >> wrote: >>> >>> On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert >>> wrote: I think there may be some suspicious code in inet_

Re: Soft lockup in inet_put_port on 4.6

2016-12-15 Thread Josef Bacik
On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && r

Re: Soft lockup in inet_put_port on 4.6

2016-12-13 Thread Tom Herbert
On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek wrote: > On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: >> I think there may be some suspicious code in inet_csk_get_port. At >> tb_found there is: >> >> if (((tb->fastreuse > 0 && reuse) || >> (tb->fastreusep

Re: Soft lockup in inet_put_port on 4.6

2016-12-13 Thread Craig Gallek
On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: > I think there may be some suspicious code in inet_csk_get_port. At > tb_found there is: > > if (((tb->fastreuse > 0 && reuse) || > (tb->fastreuseport > 0 && > !rcu_access_pointer(sk->sk

Re: Soft lockup in inet_put_port on 4.6

2016-12-13 Thread Tom Herbert
I think there may be some suspicious code in inet_csk_get_port. At tb_found there is: if (((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_

Re: Soft lockup in inet_put_port on 4.6

2016-12-12 Thread Josef Bacik
On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowa wrote: On 12.12.2016 19:05, Josef Bacik wrote: On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet wrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: Hmm... Is your ephemeral port range includes the port your load balanci

Re: Soft lockup in inet_put_port on 4.6

2016-12-12 Thread Josef Bacik
On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowa wrote: On 12.12.2016 19:05, Josef Bacik wrote: On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet wrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: Hmm... Is your ephemeral port range includes the port your load balancing

Re: Soft lockup in inet_put_port on 4.6

2016-12-12 Thread Hannes Frederic Sowa
On 12.12.2016 19:05, Josef Bacik wrote: > On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet > wrote: >> On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: >> >>> >>> Hmm... Is your ephemeral port range includes the port your load >>> balancing app is using ? >> >> I suspect that you might have p

Re: Soft lockup in inet_put_port on 4.6

2016-12-12 Thread Josef Bacik
On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet wrote: On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: Hmm... Is your ephemeral port range includes the port your load balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind

Re: Soft lockup in inet_put_port on 4.6

2016-12-09 Thread Eric Dumazet
On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote: > > Hmm... Is your ephemeral port range includes the port your load > balancing app is using ? I suspect that you might have processes doing bind( port = 0) that are trapped into the bind_conflict() scan ? With 100,000 + timewaits there, th

Re: Soft lockup in inet_put_port on 4.6

2016-12-09 Thread Eric Dumazet
On Fri, 2016-12-09 at 20:59 -0500, Josef Bacik wrote: > On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacik wrote: > > > >> On Dec 8, 2016, at 7:32 PM, Eric Dumazet > >> wrote: > >> > >>> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: > >>> > >>> We can reproduce the problem at will, still

Re: Soft lockup in inet_put_port on 4.6

2016-12-09 Thread Josef Bacik
On Thu, Dec 8, 2016 at 8:01 PM, Josef Bacik wrote: On Dec 8, 2016, at 7:32 PM, Eric Dumazet wrote: On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: We can reproduce the problem at will, still trying to run down the problem. I'll try and find one of the boxes that dumped a core

Re: Soft lockup in inet_put_port on 4.6

2016-12-08 Thread Josef Bacik
> On Dec 8, 2016, at 7:32 PM, Eric Dumazet wrote: > >> On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: >> >> We can reproduce the problem at will, still trying to run down the >> problem. I'll try and find one of the boxes that dumped a core and get >> a bt of everybody. Thanks, > >

Re: Soft lockup in inet_put_port on 4.6

2016-12-08 Thread Eric Dumazet
On Thu, 2016-12-08 at 16:36 -0500, Josef Bacik wrote: > We can reproduce the problem at will, still trying to run down the > problem. I'll try and find one of the boxes that dumped a core and get > a bt of everybody. Thanks, OK, sounds good. I had a look and : - could not spot a fix that cam

Re: Soft lockup in inet_put_port on 4.6

2016-12-08 Thread Josef Bacik
On Thu, Dec 8, 2016 at 4:03 PM, Hannes Frederic Sowa wrote: Hello Tom, On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote: We are seeing a fair number of machines getting into softlockup in 4.6 kernel. As near as I can tell this is happening on the spinlock in bind hash bucket. When inet_csk

Re: Soft lockup in inet_put_port on 4.6

2016-12-08 Thread Hannes Frederic Sowa
Hello Tom, On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote: > We are seeing a fair number of machines getting into softlockup in 4.6 > kernel. As near as I can tell this is happening on the spinlock in > bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh > the TCP timer runs an

Soft lockup in inet_put_port on 4.6

2016-12-06 Thread Tom Herbert
Hello, We are seeing a fair number of machines getting into softlockup in 4.6 kernel. As near as I can tell this is happening on the spinlock in bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh the TCP timer runs and we hit lockup in inet_put_port (presumably on same lock). It