Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 08:51, Hans Petter Selasky  wrote:
> 
> On 08/08/17 01:52, Ben RUBSON wrote:
>>> On 07 Aug 2017, at 19:57, Hans Petter Selasky  wrote:
>>> 
>>> On 08/07/17 19:19, Ben RUBSON wrote:
> On 07 Aug 2017, at 18:19, Matt Joras  wrote:
> 
> On 08/07/2017 09:11, Hans Petter Selasky wrote:
>> Hi,
>> 
>> Try to enter "kgdb" and run:
>> 
>> thread apply all bt
>> 
>> Look for the callout function in question.
>> 
>> --HPS
>> 
> If you don't have a way to attach kgdb handy you could also break into
> ddb(4) and run "alltrace". Though gdb would be more useful for an
> ongoing session if we need more than the backtrace since you could
> switch to that thread and investigate it directly.
> 
 Hi Hans & Matt,
 Thank you for your answers, glad to hear from you :)
 So here is the full kgdb(thread apply all bt) command log :
 https://benrubson.github.io/kgdb.log
 We found the faulty thread :
 # procstat -ak | grep "swi4.*tcp"
12 100029 intr swi4: clock (0)  tcp_tw_2msl_scan pfslowtimo 
 softclock_call_cc softclock intr_event_execute_handlers ithread_loop 
 fork_exit fork_trampoline
 # kgdb
 (...)
 Thread 747 (Thread 100029):
 #0  sched_switch (td=0xf8000f337500, newtd=0xf8010e144000, 
 flags=) at /usr/src/sys/kern/sched_ule.c:1973
 #1  0xfe1000f92d80 in ?? ()
 #2  0xfe0f8f74b6e0 in ?? ()
 #3  0x810bd274 in handleevents (now=, 
 fake=Error accessing memory address 0xffcc: Bad address.
 ) at /usr/src/sys/kern/kern_clocksource.c:223
 Previous frame inner to this frame (corrupt stack?)
 (...)
 Of course let me know if you need further info.
>>> 
>>> Can you try to dump "td":
>>> 
>>> set print pretty on
>>> thread 747
>>> frame 0
>>> print *td
>>> 
>>> It might give some more clues.
>> Here it is :
>> https://benrubson.github.io/td.log
>> Thx !
> 
> Can you show output from:
> 
> vmstat -z
> 
> Can you from kgdb do:
> 
> print V_twq_2msl
> 
> And follow the next link field and see where it goes?

Here is vmstat -z :
https://benrubson.github.io/vmstatz.log

"print V_twq_2msl" returns the following :
No symbol "V_twq_2msl" in current context.

Even if I rerun this before (as I exited kgdb) :
set print pretty on
thread 747
frame 0

Ben
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 09:04, Ben RUBSON wrote:

"print V_twq_2msl" returns the following :
No symbol "V_twq_2msl" in current context.


Are you using VIMAGE ?

--HPS
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 09:04, Ben RUBSON wrote:

Here is vmstat -z :
https://benrubson.github.io/vmstatz.log


From what I can see there are not TCP allocation failures. This rules 
out one class of bugs:




socket: 864, 2092652, 105, 371, 2318298,   0,   0
unpcb:  240, 2092656,  14,1506,   70953,   0,   0
ipq: 56, 127374,   0,   0,   0,   0,   0
udp_inpcb:  464, 2092656,  12,1044, 1515824,   0,   0
udpcb:   32, 2092750,  12,7363, 1515824,   0,   0
tcp_inpcb:  464, 2092656, 448, 432,  731341,   0,   0
tcpcb: 1040, 2092653,  35, 205,  731341,   0,   0
tcptw:   88,  27810, 413,3862,  161063,   0,   0
syncache:   168,  15364,   0,1357,   17611,   0,   0
hostcache:  128,  0,   0,   0,   0,   0,   0
sackhole:32,  0,   0,5625,   42406,   0,   0
tcpreass:40, 254700,   0,6900, 2276087,   0,   0




--HPS
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON

> On 08 Aug 2017, at 09:33, Hans Petter Selasky  wrote:
> 
> On 08/08/17 09:04, Ben RUBSON wrote:
>> "print V_twq_2msl" returns the following :
>> No symbol "V_twq_2msl" in current context.
> 
> Are you using VIMAGE ?

No, GENERIC FreeBSD 11.0 on a physical server.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 09:37, Ben RUBSON wrote:



On 08 Aug 2017, at 09:33, Hans Petter Selasky  wrote:

On 08/08/17 09:04, Ben RUBSON wrote:

"print V_twq_2msl" returns the following :
No symbol "V_twq_2msl" in current context.


Are you using VIMAGE ?


No, GENERIC FreeBSD 11.0 on a physical server.


Can you try to figure out where this symbol is located. We need to have 
a dump of it.


--HPS

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON

> On 08 Aug 2017, at 09:38, Hans Petter Selasky  wrote:
> 
> On 08/08/17 09:37, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 09:33, Hans Petter Selasky  wrote:
>>> 
>>> On 08/08/17 09:04, Ben RUBSON wrote:
 "print V_twq_2msl" returns the following :
 No symbol "V_twq_2msl" in current context.
>>> 
>>> Are you using VIMAGE ?
>> No, GENERIC FreeBSD 11.0 on a physical server.
> 
> Can you try to figure out where this symbol is located. We need to have a 
> dump of it.

OK.
I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ?
Thank you !
Ben
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON

> On 08 Aug 2017, at 09:38, Hans Petter Selasky  wrote:
> 
> On 08/08/17 09:37, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 09:33, Hans Petter Selasky  wrote:
>>> 
>>> On 08/08/17 09:04, Ben RUBSON wrote:
 "print V_twq_2msl" returns the following :
 No symbol "V_twq_2msl" in current context.
>>> 
>>> Are you using VIMAGE ?
>> No, GENERIC FreeBSD 11.0 on a physical server.
> 
> Can you try to figure out where this symbol is located. We need to have a 
> dump of it.

Perhaps we can't find the symbol due to the frame error which seems to be 
reported ?

Thread 747 (Thread 100029):
#0  sched_switch (td=0xf8000f337500, newtd=0xf8010e144000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0xfe1000f92d80 in ?? ()
#2  0xfe0f8f74b6e0 in ?? ()
#3  0x810bd274 in handleevents (now=, fake=Error 
accessing memory address 0xffcc: Bad address.
) at /usr/src/sys/kern/kern_clocksource.c:223
Previous frame inner to this frame (corrupt stack?)

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 09:43, Ben RUBSON wrote:

OK.
I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ?
Thank you !
Ben


print twq_2msl
print *twq_2msl.tqh_first


--HPS
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON

> On 08 Aug 2017, at 09:54, Hans Petter Selasky  wrote:
> 
> On 08/08/17 09:43, Ben RUBSON wrote:
>> OK.
>> I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ?
>> Thank you !
>> Ben
> 
> print twq_2msl
> print *twq_2msl.tqh_first

(kgdb) set print pretty on
(kgdb) print twq_2msl
$1 = {
  tqh_first = 0xf80d1bf12210, 
  tqh_last = 0xf808b73433a8
}
(kgdb) print *twq_2msl.tqh_first
$2 = {
  tw_inpcb = 0xf8031c570740, 
  snd_nxt = 4140013171, 
  rcv_nxt = 3258392061, 
  iss = 4140006661, 
  irs = 3258390776, 
  last_win = 513, 
  tw_so_options = 12, 
  tw_cred = 0xf804ae6ca400, 
  t_recent = 1263366217, 
  ts_offset = 4279193351, 
  t_starttime = 606489428, 
  tw_time = 606516815, 
  tw_2msl = {
tqe_next = 0xf80ff71453c8, 
tqe_prev = 0x81d763a0
  }, 
  tw_pspare = 0x0, 
  tw_spare = 0x0
}
(kgdb) 



































































___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 10:00, Ben RUBSON wrote:

kgdb) print *twq_2msl.tqh_first
$2 = {
   tw_inpcb = 0xf8031c570740,


print *twq_2msl.tqh_first->tw_inpcb

--HPS
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%...

2017-08-08 Thread Ben RUBSON
> On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:
> 
> On 08/08/17 10:00, Ben RUBSON wrote:
>> kgdb) print *twq_2msl.tqh_first
>> $2 = {
>>   tw_inpcb = 0xf8031c570740,
> 
> print *twq_2msl.tqh_first->tw_inpcb

(kgdb) print *twq_2msl.tqh_first->tw_inpcb
$3 = {
  inp_hash = {
le_next = 0x0, 
le_prev = 0xfe000f78adb8
  }, 
  inp_pcbgrouphash = {
le_next = 0x0, 
le_prev = 0x0
  }, 
  inp_list = {
le_next = 0xf80c2a07f570, 
le_prev = 0x81e15e20
  }, 
  inp_ppcb = 0xf80d1bf12210, 
  inp_pcbinfo = 0x81e15e28, 
  inp_pcbgroup = 0x0, 
  inp_pcbgroup_wild = {
le_next = 0x0, 
le_prev = 0x0
  }, 
  inp_socket = 0x0, 
  inp_cred = 0xf804ae6ca400, 
  inp_flow = 0, 
  inp_flags = 92274688, 
  inp_flags2 = 16, 
  inp_vflag = 0 '\0', 
  inp_ip_ttl = 64 '@', 
  inp_ip_p = 0 '\0', 
  inp_ip_minttl = 0 '\0', 
  inp_flowid = 946611505, 
  inp_refcount = 2, 
  inp_pspare = 0xf8031c5707c0, 
  inp_flowtype = 191, 
  inp_rss_listen_bucket = 0, 
  inp_ispare = 0xf8031c5707f0, 
  inp_inc = {
inc_flags = 0 '\0', 
inc_len = 0 '\0', 
inc_fibnum = 0, 
inc_ie = {
  ie_fport = 53987, 
  ie_lport = 47873, 
  ie_dependfaddr = {
ie46_foreign = {
  ia46_pad32 = 0xf8031c570808, 
  ia46_addr4 = {
s_addr = 3011802202
  }
}, 
ie6_foreign = {
  __u6_addr = {
__u6_addr8 = 0xf8031c570808 "", 
__u6_addr16 = 0xf8031c570808, 
__u6_addr32 = 0xf8031c570808
  }
}
  }, 
  ie_dependladdr = {
ie46_local = {
  ia46_pad32 = 0xf8031c570818, 
  ia46_addr4 = {
s_addr = 4068705883
  }
}, 
ie6_local = {
  __u6_addr = {
__u6_addr8 = 0xf8031c570818 "", 
__u6_addr16 = 0xf8031c570818, 
__u6_addr32 = 0xf8031c570818
  }
}
  }, 
  ie6_zoneid = 0
}
  }, 
  inp_label = 0x0, 
  inp_sp = 0x0, 
  inp_depend4 = {
inp4_ip_tos = 0 '\0', 
inp4_options = 0x0, 
inp4_moptions = 0x0
  }, 
  inp_depend6 = {
inp6_options = 0x0, 
inp6_outputopts = 0x0, 
inp6_moptions = 0x0, 
inp6_icmp6filt = 0x0, 
inp6_cksum = 0, 
inp6_hops = 0
  }, 
  inp_portlist = {
le_next = 0xf80274298ae0, 
le_prev = 0xf800454999b0
  }, 
  inp_phd = 0xf800454999a0, 
  inp_gencnt = 2119756, 
  inp_lle = 0x0, 
  inp_lock = {
lock_object = {
  lo_name = 0x814e6940 "tcpinp", 
  lo_flags = 90898432, 
  lo_data = 0, 
  lo_witness = 0x0
}, 
rw_lock = 18446735277871559936
  }, 
  inp_rt_cookie = 10, 
  inp_rtu = {
inpu_route = {
  ro_rt = 0x0, 
  ro_lle = 0x0, 
  ro_prepend = 0x0, 
  ro_plen = 0, 
  ro_flags = 384, 
  ro_mtu = 0, 
  spare = 0, 
  ro_dst = {
sa_len = 16 '\020', 
sa_family = 2 '\002', 
sa_data = 0xf8031c5708f2 ""
  }
}, 
inpu_route6 = {
  ro_rt = 0x0, 
  ro_lle = 0x0, 
  ro_prepend = 0x0, 
  ro_plen = 0, 
  ro_flags = 384, 
  ro_mtu = 0, 
  spare = 0, 
  ro_dst = {
sin6_len = 16 '\020', 
sin6_family = 2 '\002', 
sin6_port = 0, 
sin6_flowinfo = 3011802202, 
sin6_addr = {
  __u6_addr = {
__u6_addr8 = 0xf8031c5708f8 "", 
__u6_addr16 = 0xf8031c5708f8, 
__u6_addr32 = 0xf8031c5708f8
  }
}, 
sin6_scope_id = 0
  }
}
  }
}
(kgdb) 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 10:06, Ben RUBSON wrote:

On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:

On 08/08/17 10:00, Ben RUBSON wrote:

kgdb) print *twq_2msl.tqh_first
$2 = {
   tw_inpcb = 0xf8031c570740,


print *twq_2msl.tqh_first->tw_inpcb


(kgdb) print *twq_2msl.tqh_first->tw_inpcb
$3 = {
   inp_hash = {
 le_next = 0x0,
 le_prev = 0xfe000f78adb8
   },
   inp_pcbgrouphash = {
 le_next = 0x0,
 le_prev = 0x0
   },
   inp_list = {
 le_next = 0xf80c2a07f570,
 le_prev = 0x81e15e20
   },
   inp_ppcb = 0xf80d1bf12210,
   inp_pcbinfo = 0x81e15e28,
   inp_pcbgroup = 0x0,
   inp_pcbgroup_wild = {
 le_next = 0x0,
 le_prev = 0x0
   },
   inp_socket = 0x0,
   inp_cred = 0xf804ae6ca400,
   inp_flow = 0,
   inp_flags = 92274688,
   inp_flags2 = 16,
   inp_vflag = 0 '\0',
   inp_ip_ttl = 64 '@',
   inp_ip_p = 0 '\0',
   inp_ip_minttl = 0 '\0',
   inp_flowid = 946611505,
   inp_refcount = 2,
   inp_pspare = 0xf8031c5707c0,
   inp_flowtype = 191,
   inp_rss_listen_bucket = 0,
   inp_ispare = 0xf8031c5707f0,
   inp_inc = {
 inc_flags = 0 '\0',
 inc_len = 0 '\0',
 inc_fibnum = 0,
 inc_ie = {
   ie_fport = 53987,
   ie_lport = 47873,
   ie_dependfaddr = {
 ie46_foreign = {
   ia46_pad32 = 0xf8031c570808,
   ia46_addr4 = {
 s_addr = 3011802202
   }
 },
 ie6_foreign = {
   __u6_addr = {
 __u6_addr8 = 0xf8031c570808 "",
 __u6_addr16 = 0xf8031c570808,
 __u6_addr32 = 0xf8031c570808
   }
 }
   },
   ie_dependladdr = {
 ie46_local = {
   ia46_pad32 = 0xf8031c570818,
   ia46_addr4 = {
 s_addr = 4068705883
   }
 },
 ie6_local = {
   __u6_addr = {
 __u6_addr8 = 0xf8031c570818 "",
 __u6_addr16 = 0xf8031c570818,
 __u6_addr32 = 0xf8031c570818
   }
 }
   },
   ie6_zoneid = 0
 }
   },
   inp_label = 0x0,
   inp_sp = 0x0,
   inp_depend4 = {
 inp4_ip_tos = 0 '\0',
 inp4_options = 0x0,
 inp4_moptions = 0x0
   },
   inp_depend6 = {
 inp6_options = 0x0,
 inp6_outputopts = 0x0,
 inp6_moptions = 0x0,
 inp6_icmp6filt = 0x0,
 inp6_cksum = 0,
 inp6_hops = 0
   },
   inp_portlist = {
 le_next = 0xf80274298ae0,
 le_prev = 0xf800454999b0
   },
   inp_phd = 0xf800454999a0,
   inp_gencnt = 2119756,
   inp_lle = 0x0,
   inp_lock = {
 lock_object = {
   lo_name = 0x814e6940 "tcpinp",
   lo_flags = 90898432,
   lo_data = 0,
   lo_witness = 0x0
 },
 rw_lock = 18446735277871559936
   },
   inp_rt_cookie = 10,
   inp_rtu = {
 inpu_route = {
   ro_rt = 0x0,
   ro_lle = 0x0,
   ro_prepend = 0x0,
   ro_plen = 0,
   ro_flags = 384,
   ro_mtu = 0,
   spare = 0,
   ro_dst = {
 sa_len = 16 '\020',
 sa_family = 2 '\002',
 sa_data = 0xf8031c5708f2 ""
   }
 },
 inpu_route6 = {
   ro_rt = 0x0,
   ro_lle = 0x0,
   ro_prepend = 0x0,
   ro_plen = 0,
   ro_flags = 384,
   ro_mtu = 0,
   spare = 0,
   ro_dst = {
 sin6_len = 16 '\020',
 sin6_family = 2 '\002',
 sin6_port = 0,
 sin6_flowinfo = 3011802202,
 sin6_addr = {
   __u6_addr = {
 __u6_addr8 = 0xf8031c5708f8 "",
 __u6_addr16 = 0xf8031c5708f8,
 __u6_addr32 = 0xf8031c5708f8
   }
 },
 sin6_scope_id = 0
   }
 }
   }
}
(kgdb)



Hi,

Here is the conclusion:

The following code is going in an infinite loop:



for (;;) {
TW_RLOCK(V_tw_lock);
tw = TAILQ_FIRST(&V_twq_2msl);
if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) {
TW_RUNLOCK(V_tw_lock);
break;
}
KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL",
__func__));

inp = tw->tw_inpcb;
in_pcbref(inp);
TW_RUNLOCK(V_tw_lock);

if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) {

INP_WLOCK(inp);
tw = intotw(inp);
if (in_pcbrele_wlocked(inp)) {


in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in 
inp->inp_flags2. I guess you have invariants disabled, because the 
KASSERT() below should have caused a panic.



KASSERT(tw == NULL, ("%s: held last inp "
"reference but tw not NULL", __func__));
INP_INFO_RUNLOCK(&V_tcbinfo);
continue;
}


This is a regression issue after:


commit 5630210a7f1dbbd903b77b2aef939cd47c63da58

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Ben RUBSON

> On 08 Aug 2017, at 10:31, Hans Petter Selasky  wrote:
> 
> On 08/08/17 10:06, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:
>>> 
>>> On 08/08/17 10:00, Ben RUBSON wrote:
 kgdb) print *twq_2msl.tqh_first
 $2 = {
   tw_inpcb = 0xf8031c570740,
>>> 
>>> print *twq_2msl.tqh_first->tw_inpcb
>> (kgdb) print *twq_2msl.tqh_first->tw_inpcb
>> $3 = {
>>   inp_hash = {
>> le_next = 0x0,
>> le_prev = 0xfe000f78adb8
>>   },
>>   inp_pcbgrouphash = {
>> le_next = 0x0,
>> le_prev = 0x0
>>   },
>>   inp_list = {
>> le_next = 0xf80c2a07f570,
>> le_prev = 0x81e15e20
>>   },
>>   inp_ppcb = 0xf80d1bf12210,
>>   inp_pcbinfo = 0x81e15e28,
>>   inp_pcbgroup = 0x0,
>>   inp_pcbgroup_wild = {
>> le_next = 0x0,
>> le_prev = 0x0
>>   },
>>   inp_socket = 0x0,
>>   inp_cred = 0xf804ae6ca400,
>>   inp_flow = 0,
>>   inp_flags = 92274688,
>>   inp_flags2 = 16,
>>   inp_vflag = 0 '\0',
>>   inp_ip_ttl = 64 '@',
>>   inp_ip_p = 0 '\0',
>>   inp_ip_minttl = 0 '\0',
>>   inp_flowid = 946611505,
>>   inp_refcount = 2,
>>   inp_pspare = 0xf8031c5707c0,
>>   inp_flowtype = 191,
>>   inp_rss_listen_bucket = 0,
>>   inp_ispare = 0xf8031c5707f0,
>>   inp_inc = {
>> inc_flags = 0 '\0',
>> inc_len = 0 '\0',
>> inc_fibnum = 0,
>> inc_ie = {
>>   ie_fport = 53987,
>>   ie_lport = 47873,
>>   ie_dependfaddr = {
>> ie46_foreign = {
>>   ia46_pad32 = 0xf8031c570808,
>>   ia46_addr4 = {
>> s_addr = 3011802202
>>   }
>> },
>> ie6_foreign = {
>>   __u6_addr = {
>> __u6_addr8 = 0xf8031c570808 "",
>> __u6_addr16 = 0xf8031c570808,
>> __u6_addr32 = 0xf8031c570808
>>   }
>> }
>>   },
>>   ie_dependladdr = {
>> ie46_local = {
>>   ia46_pad32 = 0xf8031c570818,
>>   ia46_addr4 = {
>> s_addr = 4068705883
>>   }
>> },
>> ie6_local = {
>>   __u6_addr = {
>> __u6_addr8 = 0xf8031c570818 "",
>> __u6_addr16 = 0xf8031c570818,
>> __u6_addr32 = 0xf8031c570818
>>   }
>> }
>>   },
>>   ie6_zoneid = 0
>> }
>>   },
>>   inp_label = 0x0,
>>   inp_sp = 0x0,
>>   inp_depend4 = {
>> inp4_ip_tos = 0 '\0',
>> inp4_options = 0x0,
>> inp4_moptions = 0x0
>>   },
>>   inp_depend6 = {
>> inp6_options = 0x0,
>> inp6_outputopts = 0x0,
>> inp6_moptions = 0x0,
>> inp6_icmp6filt = 0x0,
>> inp6_cksum = 0,
>> inp6_hops = 0
>>   },
>>   inp_portlist = {
>> le_next = 0xf80274298ae0,
>> le_prev = 0xf800454999b0
>>   },
>>   inp_phd = 0xf800454999a0,
>>   inp_gencnt = 2119756,
>>   inp_lle = 0x0,
>>   inp_lock = {
>> lock_object = {
>>   lo_name = 0x814e6940 "tcpinp",
>>   lo_flags = 90898432,
>>   lo_data = 0,
>>   lo_witness = 0x0
>> },
>> rw_lock = 18446735277871559936
>>   },
>>   inp_rt_cookie = 10,
>>   inp_rtu = {
>> inpu_route = {
>>   ro_rt = 0x0,
>>   ro_lle = 0x0,
>>   ro_prepend = 0x0,
>>   ro_plen = 0,
>>   ro_flags = 384,
>>   ro_mtu = 0,
>>   spare = 0,
>>   ro_dst = {
>> sa_len = 16 '\020',
>> sa_family = 2 '\002',
>> sa_data = 0xf8031c5708f2 ""
>>   }
>> },
>> inpu_route6 = {
>>   ro_rt = 0x0,
>>   ro_lle = 0x0,
>>   ro_prepend = 0x0,
>>   ro_plen = 0,
>>   ro_flags = 384,
>>   ro_mtu = 0,
>>   spare = 0,
>>   ro_dst = {
>> sin6_len = 16 '\020',
>> sin6_family = 2 '\002',
>> sin6_port = 0,
>> sin6_flowinfo = 3011802202,
>> sin6_addr = {
>>   __u6_addr = {
>> __u6_addr8 = 0xf8031c5708f8 "",
>> __u6_addr16 = 0xf8031c5708f8,
>> __u6_addr32 = 0xf8031c5708f8
>>   }
>> },
>> sin6_scope_id = 0
>>   }
>> }
>>   }
>> }
>> (kgdb)
> 
> Hi,
> 
> Here is the conclusion:
> 
> The following code is going in an infinite loop:
> 
> 
>>for (;;) {
>>TW_RLOCK(V_tw_lock);
>>tw = TAILQ_FIRST(&V_twq_2msl);
>>if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) {
>>TW_RUNLOCK(V_tw_lock);
>>break;
>>}
>>KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL",
>>__func__));
>>inp = tw->tw_inpcb;
>>in_pcbref(inp);
>>TW_RUNLOCK(V_tw_lock);
>>if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) {
>>INP_WLOCK(inp);
>>tw = intotw(inp);
>>if (in_pcbrele_wlocked(inp)) {
> 
> in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in 
> inp->inp_flags2. I guess you have invarian

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 10:31:33AM +0200, Hans Petter Selasky wrote:

> Here is the conclusion:
> 
> The following code is going in an infinite loop:
> 
> 
> > for (;;) {
> > TW_RLOCK(V_tw_lock);
> > tw = TAILQ_FIRST(&V_twq_2msl);
> > if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) {
> > TW_RUNLOCK(V_tw_lock);
> > break;
> > }
> > KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL",
> > __func__));
> > 
> > inp = tw->tw_inpcb;
> > in_pcbref(inp);
> > TW_RUNLOCK(V_tw_lock);
> > 
> > if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) {
> > 
> > INP_WLOCK(inp);
> > tw = intotw(inp);
> > if (in_pcbrele_wlocked(inp)) {
> 
> in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in 
> inp->inp_flags2. I guess you have invariants disabled, because the 
> KASSERT() below should have caused a panic.
> 
> > KASSERT(tw == NULL, ("%s: held last inp "
> > "reference but tw not NULL", __func__));
> > INP_INFO_RUNLOCK(&V_tcbinfo);
> > continue;
> > }
> 
> This is a regression issue after:
> 
> > commit 5630210a7f1dbbd903b77b2aef939cd47c63da58
> > Author: jch 
> > Date:   Thu Oct 30 08:53:56 2014 +
> > 
> > Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
> > tcp_tw_2msl_scan().  This race condition drives unplanned timewait
> > timeout cancellation.  Also simplify implementation by holding inpcb
> > reference and removing tcptw reference counting.
> 
> Suggested fix attached.

Hmm, I am not sure, IMHO between

TW_RUNLOCK(V_tw_lock);
and
if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) {

`inp` can be invalidated, freed and this pointer may be invalid?


> Index: sys/netinet/tcp_timewait.c
> ===
> --- sys/netinet/tcp_timewait.c(revision 321981)
> +++ sys/netinet/tcp_timewait.c(working copy)
> @@ -709,10 +709,11 @@
>   INP_WLOCK(inp);
>   tw = intotw(inp);
>   if (in_pcbrele_wlocked(inp)) {
> - KASSERT(tw == NULL, ("%s: held last inp "
> - "reference but tw not NULL", __func__));
>   INP_INFO_RUNLOCK(&V_tcbinfo);
> - continue;
> + if (tw == NULL)
> + continue;
> + else
> + break;  /* INP_FREED flag is set */
>   }
>  
>   if (tw == NULL) {

> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 13:33, Slawa Olhovchenkov wrote:

TW_RUNLOCK(V_tw_lock);
and
if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) {

`inp` can be invalidated, freed and this pointer may be invalid?


If you look one line up there is a pcbref ??

--HPS
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote:

> On 08/08/17 13:33, Slawa Olhovchenkov wrote:
> > TW_RUNLOCK(V_tw_lock);
> > and
> > if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) {
> > 
> > `inp` can be invalidated, freed and this pointer may be invalid?
> 
> If you look one line up there is a pcbref ??

Yes.
Can different thread take this inp and freed it?
May be timer thread?
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Julien Charbon

 Hi,

On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
> On 08/08/17 10:06, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:
>>>
>>> On 08/08/17 10:00, Ben RUBSON wrote:
 kgdb) print *twq_2msl.tqh_first
 $2 = {
tw_inpcb = 0xf8031c570740,
>>>
>>> print *twq_2msl.tqh_first->tw_inpcb
>>
> 
> Here is the conclusion:
> 
> The following code is going in an infinite loop:
> 
> 
>> for (;;) {
>> TW_RLOCK(V_tw_lock);
>> tw = TAILQ_FIRST(&V_twq_2msl);
>> if (tw == NULL || (!reuse && (tw->tw_time - ticks) >
>> 0)) {
>> TW_RUNLOCK(V_tw_lock);
>> break;
>> }
>> KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb ==
>> NULL",
>> __func__));
>>
>> inp = tw->tw_inpcb;
>> in_pcbref(inp);
>> TW_RUNLOCK(V_tw_lock);
>>
>> if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) {
>>
>> INP_WLOCK(inp);
>> tw = intotw(inp);
>> if (in_pcbrele_wlocked(inp)) {
> 
> in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in
> inp->inp_flags2. I guess you have invariants disabled, because the
> KASSERT() below should have caused a panic.
> 
>> KASSERT(tw == NULL, ("%s: held last inp "
>> "reference but tw not NULL",
>> __func__));
>> INP_INFO_RUNLOCK(&V_tcbinfo);
>> continue;
>> }
> 
> This is a regression issue after:
> 
>> commit 5630210a7f1dbbd903b77b2aef939cd47c63da58
>> Author: jch 
>> Date:   Thu Oct 30 08:53:56 2014 +
>>
>> Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
>> tcp_tw_2msl_scan().  This race condition drives unplanned timewait
>> timeout cancellation.  Also simplify implementation by holding inpcb
>> reference and removing tcptw reference counting.
> 
> Suggested fix attached.

 I agree we your conclusion.  Just for the record, more precisely this
regression seems to have been introduced with:

commit b02d40ddcda08b51a49e5667e6808f5dc5ec0472
Author: fabient 
Date:   Wed Nov 25 14:45:43 2015 +

The r241129 description was wrong that the scenario is possible
only for read locks on pcbs. The same race can happen with write
lock semantics as well.

The race scenario:

- Two threads (1 and 2) locate pcb with writer semantics
(INPLOOKUP_WLOCKPCB)
 and do in_pcbref() on it.
- 1 and 2 both drop the inp hash lock.
- Another thread (3) grabs the inp hash lock. Then it runs in_pcbfree(),
 which wlocks the pcb. They must happen faster than 1 or 2 come
INP_WLOCK()!
- 1 and 2 congest in INP_WLOCK().
- 3 does in_pcbremlists(), drops hash lock, and runs
in_pcbrele_wlocked(),
 which doesn't free the pcb due to two references on it.
 Then it unlocks the pcb.
- 1 (or 2) gets wlock on the pcb, runs in_pcbrele_wlocked(), which
doesn't
 report inp as freed, due to 2 (or 1) still helding extra reference
on it.
 The thread tries to do smth with a disconnected pcb and crashes.

Submitted by:   emeric.pou...@stormshield.eu
Reviewed by:gleb@
MFC after:  1 week
Sponsored by: Stormshield
Tested by: Cassiano Peixoto, Stormshield

Notes:
svn path=/head/; revision=291301

 Before this change in_pcbrele_wlocked() returned 1 only on the last
reference which is what tcp_tw_2msl_scan() currently expects.

 Thus good catch, and your patch looks good.  I am going to just verify
the other in_pcbrele_wlocked() calls in TCP stack.

 Thanks.

--
Julien
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 13:56, Slawa Olhovchenkov wrote:

On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote:


On 08/08/17 13:33, Slawa Olhovchenkov wrote:

TW_RUNLOCK(V_tw_lock);
and
if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) {

`inp` can be invalidated, freed and this pointer may be invalid?


If you look one line up there is a pcbref ??


Yes.
Can different thread take this inp and freed it?
May be timer thread?


No, it cannot be freed while there is a ref.

Some lines down the ref is dropped once the inp pointer is no longer needed.

--HPS
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 217637] One TCP connection accepted TWO times

2017-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637

Jonathan T. Looney  changed:

   What|Removed |Added

 CC||j...@freebsd.org

--- Comment #89 from Jonathan T. Looney  ---
FWIW, I have created a patch to address the behavior reported in comment #87.
It maintains the RFC-compliant behavior of sending ACKs, but rate limits them.
This isn't perfect, but seems better (to me) than either of the extreme
alternatives (drop every packet or ACK every packet). I'd be curious to have
someone try this and see if it provides effective mitigation.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 217637] One TCP connection accepted TWO times

2017-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637

--- Comment #90 from Jonathan T. Looney  ---
(In reply to Jonathan T. Looney from comment #89)
Forgot to include a link to the patch:

https://reviews.freebsd.org/D11929

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 217637] One TCP connection accepted TWO times

2017-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637

--- Comment #91 from Richard Russo  ---
For us, this may have limited effect, because our icmplim is much higher than
the default (16k), because we do want to send closed port RSTs in high
volumewhen our service ports are closed. This patch should keep our system
available, but we'd still be sending out a rather large number of acks on these
broken connections. Given that r317208 means we only see this condition when
the syncache overflows, this may be a reasonable trade off.

I have been thinking that it might be useful if these acks (or other unexpected
packets) would actually push the connection into one of the existing state
recovery behaviors, such as immediately triggering a keepalive sequence if one
wasn't already ongoing. That's certainly a larger change though.

I'll get this patch up somewhere and let you know how it goes.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"