Re: 11.0 stuck on high network load

Julien Charbon Wed, 12 Oct 2016 02:50:01 -0700

 Hi Slawa,

On 10/12/16 10:40 AM, Slawa Olhovchenkov wrote:
> On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote:
>> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote:
>>> On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote:
>>>>  Then threads are competing for the INP_WLOCK lock.  For the example,
>>>> let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() and
>>>> racing for this INP_WLOCK:
>>>>
>>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb.c#L1964
>>>>
>>>>  And thread B wants to run tcp_timer_2msl()/tcp_close()/in_pcbdrop() and
>>>> racing for this INP_WLOCK:
>>>>
>>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_timer.c#L323
>>>>
>>>>  That leads to two cases:
>>>>
>>>>  o Thread A wins the race:
>>>>
>>>>   Thread A will continue tcp_input() as usal and INP_DROPPED flags is
>>>> not set and inp is still in TCP hash table.
>>>>   Thread B is waiting on thread A to release INP_WLOCK after finishing
>>>> tcp_input() processing, and thread B will continue
>>>> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing.
>>>>
>>>>  o Thread B wins the race:
>>>>
>>>>   Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp
>>>> INP_DROPPED is set and inp being removed from TCP hash table.
>>>>   In parallel, thread A has found the inp in TCP hash before is was
>>>> removed, and waiting on the found inp INP_WLOCK lock.
>>>>   Once thread B has released the INP_WLOCK lock, thread A gets this lock
>>>> and sees the INP_DROPPED flag and do "goto findpcb" but here because the
>>>> inp is not more in TCP hash table and it will not be find again by
>>>> in_pcblookup_mbuf().
>>>>
>>>>  Hopefully I am clear enough here.
>>>
>>> Thanks, very clear.
>>> Small qeustion: when both thread run on same CPU core, INP_WLOCK will
>>> be re-schedule?
>>
>>  Hmm, a thread can re-scheduled but not a lock. Thus no sure I
>> understand your question here.  :)
> 
> I am don't know how work INP_WLOCK in this case (all on same cpu):
> 
> thread1: INP_WLOCK
> -interrupt--
> thread2: INP_WLOCK
> 
> if INP_WLOCK is like spinlock -- this is dead lock.
> if INP_WLOCK is like mutex -- thread1 resheduled.


 Thanks, I understand you question now.  No an interrupt cannot bypass a
lock:  Here INP_WLOCK is like mutex -- thread1 resheduled.

>>> As I remeber race created by call tcp_twstart() at time of end
>>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected
>>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in 
>>> tcp_twstart()
>>
>>  Exactly, thus the current fix is:  If you already have the INP_DROPPED
>> flag set you are not allowed to call tcp_twstart(), actually it is a
>> good candidate for a new INVARIANT.  Let me add that.
>>
>>> After check source code I am found invocation of tcp_twstart() in
>>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c,
>>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c.
>>>
>>> Invocation from sys/netinet/tcp_stacks/fastpath.c and
>>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now
>>> will be OK.
>>>
>>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and
>>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed
>>> INP_WLOCK. Is this OK?
>>>
>>> Can be thread A wants do_peer_close() directed from chelsio IRQ
>>> handler, bypass tcp_input()?
>>
>>  If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and
>> t4_cpl_io.c before calling tcp_twstart().
> 
> Yes, and you remeber: sys/netinet/tcp_subr.c
> 
>   1535  struct tcpcb *
>   1536  tcp_close(struct tcpcb *tp)
>   1537  {
> ...
>   1569                  INP_WUNLOCK(inp);
>   1570                  ACCEPT_LOCK();
>   1571                  SOCK_LOCK(so);
>   1572                  so->so_state &= ~SS_PROTOREF;
>   1573                  sofree(so);
>   1574                  return (NULL);
> 
> sofree() call tcp_usr_detach() and in tcp_usr_detach() we have
> unexpected INP_TIMEWAIT.

 I see, thus just for the context:  The TCP stack in sys/dev/cxgb* is a
TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a
separate/side TCP stack that is used only with TCP_OFFLOAD option.

 This TOE TCP stack actually has its own set of detach()/input()
functions and seems to check INP_DROPPED flag properly.  I guess @np
check fixes in socket TCP stack and decides which one can also impact
the Chelsio TOE TCP stack.  Some bugs are only in socket TCP stack, some
are only in TOE TCP stack.

--
Julien

signature.asc
Description: OpenPGP digital signature

Re: 11.0 stuck on high network load

Reply via email to