Re: mlx4en, timer irq @100%...
> On 08 Aug 2017, at 08:51, Hans Petter Selasky wrote: > > On 08/08/17 01:52, Ben RUBSON wrote: >>> On 07 Aug 2017, at 19:57, Hans Petter Selasky wrote: >>> >>> On 08/07/17 19:19, Ben RUBSON wrote: > On 07 Aug 2017, at 18:19, Matt Joras wrote: > > On 08/07/2017 09:11, Hans Petter Selasky wrote: >> Hi, >> >> Try to enter "kgdb" and run: >> >> thread apply all bt >> >> Look for the callout function in question. >> >> --HPS >> > If you don't have a way to attach kgdb handy you could also break into > ddb(4) and run "alltrace". Though gdb would be more useful for an > ongoing session if we need more than the backtrace since you could > switch to that thread and investigate it directly. > Hi Hans & Matt, Thank you for your answers, glad to hear from you :) So here is the full kgdb(thread apply all bt) command log : https://benrubson.github.io/kgdb.log We found the faulty thread : # procstat -ak | grep "swi4.*tcp" 12 100029 intr swi4: clock (0) tcp_tw_2msl_scan pfslowtimo softclock_call_cc softclock intr_event_execute_handlers ithread_loop fork_exit fork_trampoline # kgdb (...) Thread 747 (Thread 100029): #0 sched_switch (td=0xf8000f337500, newtd=0xf8010e144000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0xfe1000f92d80 in ?? () #2 0xfe0f8f74b6e0 in ?? () #3 0x810bd274 in handleevents (now=, fake=Error accessing memory address 0xffcc: Bad address. ) at /usr/src/sys/kern/kern_clocksource.c:223 Previous frame inner to this frame (corrupt stack?) (...) Of course let me know if you need further info. >>> >>> Can you try to dump "td": >>> >>> set print pretty on >>> thread 747 >>> frame 0 >>> print *td >>> >>> It might give some more clues. >> Here it is : >> https://benrubson.github.io/td.log >> Thx ! > > Can you show output from: > > vmstat -z > > Can you from kgdb do: > > print V_twq_2msl > > And follow the next link field and see where it goes? Here is vmstat -z : https://benrubson.github.io/vmstatz.log "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. Even if I rerun this before (as I exited kgdb) : set print pretty on thread 747 frame 0 Ben ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. Are you using VIMAGE ? --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
On 08/08/17 09:04, Ben RUBSON wrote: Here is vmstat -z : https://benrubson.github.io/vmstatz.log From what I can see there are not TCP allocation failures. This rules out one class of bugs: socket: 864, 2092652, 105, 371, 2318298, 0, 0 unpcb: 240, 2092656, 14,1506, 70953, 0, 0 ipq: 56, 127374, 0, 0, 0, 0, 0 udp_inpcb: 464, 2092656, 12,1044, 1515824, 0, 0 udpcb: 32, 2092750, 12,7363, 1515824, 0, 0 tcp_inpcb: 464, 2092656, 448, 432, 731341, 0, 0 tcpcb: 1040, 2092653, 35, 205, 731341, 0, 0 tcptw: 88, 27810, 413,3862, 161063, 0, 0 syncache: 168, 15364, 0,1357, 17611, 0, 0 hostcache: 128, 0, 0, 0, 0, 0, 0 sackhole:32, 0, 0,5625, 42406, 0, 0 tcpreass:40, 254700, 0,6900, 2276087, 0, 0 --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
> On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: > > On 08/08/17 09:04, Ben RUBSON wrote: >> "print V_twq_2msl" returns the following : >> No symbol "V_twq_2msl" in current context. > > Are you using VIMAGE ? No, GENERIC FreeBSD 11.0 on a physical server. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
On 08/08/17 09:37, Ben RUBSON wrote: On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. Are you using VIMAGE ? No, GENERIC FreeBSD 11.0 on a physical server. Can you try to figure out where this symbol is located. We need to have a dump of it. --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
> On 08 Aug 2017, at 09:38, Hans Petter Selasky wrote: > > On 08/08/17 09:37, Ben RUBSON wrote: >>> On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: >>> >>> On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. >>> >>> Are you using VIMAGE ? >> No, GENERIC FreeBSD 11.0 on a physical server. > > Can you try to figure out where this symbol is located. We need to have a > dump of it. OK. I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ? Thank you ! Ben ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
> On 08 Aug 2017, at 09:38, Hans Petter Selasky wrote: > > On 08/08/17 09:37, Ben RUBSON wrote: >>> On 08 Aug 2017, at 09:33, Hans Petter Selasky wrote: >>> >>> On 08/08/17 09:04, Ben RUBSON wrote: "print V_twq_2msl" returns the following : No symbol "V_twq_2msl" in current context. >>> >>> Are you using VIMAGE ? >> No, GENERIC FreeBSD 11.0 on a physical server. > > Can you try to figure out where this symbol is located. We need to have a > dump of it. Perhaps we can't find the symbol due to the frame error which seems to be reported ? Thread 747 (Thread 100029): #0 sched_switch (td=0xf8000f337500, newtd=0xf8010e144000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0xfe1000f92d80 in ?? () #2 0xfe0f8f74b6e0 in ?? () #3 0x810bd274 in handleevents (now=, fake=Error accessing memory address 0xffcc: Bad address. ) at /usr/src/sys/kern/kern_clocksource.c:223 Previous frame inner to this frame (corrupt stack?) ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
On 08/08/17 09:43, Ben RUBSON wrote: OK. I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ? Thank you ! Ben print twq_2msl print *twq_2msl.tqh_first --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
> On 08 Aug 2017, at 09:54, Hans Petter Selasky wrote: > > On 08/08/17 09:43, Ben RUBSON wrote: >> OK. >> I'm quite (well, absolutely) new to kgdb, some clue on how I should proceed ? >> Thank you ! >> Ben > > print twq_2msl > print *twq_2msl.tqh_first (kgdb) set print pretty on (kgdb) print twq_2msl $1 = { tqh_first = 0xf80d1bf12210, tqh_last = 0xf808b73433a8 } (kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, snd_nxt = 4140013171, rcv_nxt = 3258392061, iss = 4140006661, irs = 3258390776, last_win = 513, tw_so_options = 12, tw_cred = 0xf804ae6ca400, t_recent = 1263366217, ts_offset = 4279193351, t_starttime = 606489428, tw_time = 606516815, tw_2msl = { tqe_next = 0xf80ff71453c8, tqe_prev = 0x81d763a0 }, tw_pspare = 0x0, tw_spare = 0x0 } (kgdb) ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, print *twq_2msl.tqh_first->tw_inpcb --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%...
> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: > > On 08/08/17 10:00, Ben RUBSON wrote: >> kgdb) print *twq_2msl.tqh_first >> $2 = { >> tw_inpcb = 0xf8031c570740, > > print *twq_2msl.tqh_first->tw_inpcb (kgdb) print *twq_2msl.tqh_first->tw_inpcb $3 = { inp_hash = { le_next = 0x0, le_prev = 0xfe000f78adb8 }, inp_pcbgrouphash = { le_next = 0x0, le_prev = 0x0 }, inp_list = { le_next = 0xf80c2a07f570, le_prev = 0x81e15e20 }, inp_ppcb = 0xf80d1bf12210, inp_pcbinfo = 0x81e15e28, inp_pcbgroup = 0x0, inp_pcbgroup_wild = { le_next = 0x0, le_prev = 0x0 }, inp_socket = 0x0, inp_cred = 0xf804ae6ca400, inp_flow = 0, inp_flags = 92274688, inp_flags2 = 16, inp_vflag = 0 '\0', inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 946611505, inp_refcount = 2, inp_pspare = 0xf8031c5707c0, inp_flowtype = 191, inp_rss_listen_bucket = 0, inp_ispare = 0xf8031c5707f0, inp_inc = { inc_flags = 0 '\0', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = { ie_fport = 53987, ie_lport = 47873, ie_dependfaddr = { ie46_foreign = { ia46_pad32 = 0xf8031c570808, ia46_addr4 = { s_addr = 3011802202 } }, ie6_foreign = { __u6_addr = { __u6_addr8 = 0xf8031c570808 "", __u6_addr16 = 0xf8031c570808, __u6_addr32 = 0xf8031c570808 } } }, ie_dependladdr = { ie46_local = { ia46_pad32 = 0xf8031c570818, ia46_addr4 = { s_addr = 4068705883 } }, ie6_local = { __u6_addr = { __u6_addr8 = 0xf8031c570818 "", __u6_addr16 = 0xf8031c570818, __u6_addr32 = 0xf8031c570818 } } }, ie6_zoneid = 0 } }, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = { inp4_ip_tos = 0 '\0', inp4_options = 0x0, inp4_moptions = 0x0 }, inp_depend6 = { inp6_options = 0x0, inp6_outputopts = 0x0, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, inp6_hops = 0 }, inp_portlist = { le_next = 0xf80274298ae0, le_prev = 0xf800454999b0 }, inp_phd = 0xf800454999a0, inp_gencnt = 2119756, inp_lle = 0x0, inp_lock = { lock_object = { lo_name = 0x814e6940 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0 }, rw_lock = 18446735277871559936 }, inp_rt_cookie = 10, inp_rtu = { inpu_route = { ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 384, ro_mtu = 0, spare = 0, ro_dst = { sa_len = 16 '\020', sa_family = 2 '\002', sa_data = 0xf8031c5708f2 "" } }, inpu_route6 = { ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 384, ro_mtu = 0, spare = 0, ro_dst = { sin6_len = 16 '\020', sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 3011802202, sin6_addr = { __u6_addr = { __u6_addr8 = 0xf8031c5708f8 "", __u6_addr16 = 0xf8031c5708f8, __u6_addr32 = 0xf8031c5708f8 } }, sin6_scope_id = 0 } } } } (kgdb) ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On 08/08/17 10:06, Ben RUBSON wrote: On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, print *twq_2msl.tqh_first->tw_inpcb (kgdb) print *twq_2msl.tqh_first->tw_inpcb $3 = { inp_hash = { le_next = 0x0, le_prev = 0xfe000f78adb8 }, inp_pcbgrouphash = { le_next = 0x0, le_prev = 0x0 }, inp_list = { le_next = 0xf80c2a07f570, le_prev = 0x81e15e20 }, inp_ppcb = 0xf80d1bf12210, inp_pcbinfo = 0x81e15e28, inp_pcbgroup = 0x0, inp_pcbgroup_wild = { le_next = 0x0, le_prev = 0x0 }, inp_socket = 0x0, inp_cred = 0xf804ae6ca400, inp_flow = 0, inp_flags = 92274688, inp_flags2 = 16, inp_vflag = 0 '\0', inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 946611505, inp_refcount = 2, inp_pspare = 0xf8031c5707c0, inp_flowtype = 191, inp_rss_listen_bucket = 0, inp_ispare = 0xf8031c5707f0, inp_inc = { inc_flags = 0 '\0', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = { ie_fport = 53987, ie_lport = 47873, ie_dependfaddr = { ie46_foreign = { ia46_pad32 = 0xf8031c570808, ia46_addr4 = { s_addr = 3011802202 } }, ie6_foreign = { __u6_addr = { __u6_addr8 = 0xf8031c570808 "", __u6_addr16 = 0xf8031c570808, __u6_addr32 = 0xf8031c570808 } } }, ie_dependladdr = { ie46_local = { ia46_pad32 = 0xf8031c570818, ia46_addr4 = { s_addr = 4068705883 } }, ie6_local = { __u6_addr = { __u6_addr8 = 0xf8031c570818 "", __u6_addr16 = 0xf8031c570818, __u6_addr32 = 0xf8031c570818 } } }, ie6_zoneid = 0 } }, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = { inp4_ip_tos = 0 '\0', inp4_options = 0x0, inp4_moptions = 0x0 }, inp_depend6 = { inp6_options = 0x0, inp6_outputopts = 0x0, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, inp6_hops = 0 }, inp_portlist = { le_next = 0xf80274298ae0, le_prev = 0xf800454999b0 }, inp_phd = 0xf800454999a0, inp_gencnt = 2119756, inp_lle = 0x0, inp_lock = { lock_object = { lo_name = 0x814e6940 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0 }, rw_lock = 18446735277871559936 }, inp_rt_cookie = 10, inp_rtu = { inpu_route = { ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 384, ro_mtu = 0, spare = 0, ro_dst = { sa_len = 16 '\020', sa_family = 2 '\002', sa_data = 0xf8031c5708f2 "" } }, inpu_route6 = { ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 384, ro_mtu = 0, spare = 0, ro_dst = { sin6_len = 16 '\020', sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 3011802202, sin6_addr = { __u6_addr = { __u6_addr8 = 0xf8031c5708f8 "", __u6_addr16 = 0xf8031c5708f8, __u6_addr32 = 0xf8031c5708f8 } }, sin6_scope_id = 0 } } } } (kgdb) Hi, Here is the conclusion: The following code is going in an infinite loop: for (;;) { TW_RLOCK(V_tw_lock); tw = TAILQ_FIRST(&V_twq_2msl); if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) { TW_RUNLOCK(V_tw_lock); break; } KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL", __func__)); inp = tw->tw_inpcb; in_pcbref(inp); TW_RUNLOCK(V_tw_lock); if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { INP_WLOCK(inp); tw = intotw(inp); if (in_pcbrele_wlocked(inp)) { in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in inp->inp_flags2. I guess you have invariants disabled, because the KASSERT() below should have caused a panic. KASSERT(tw == NULL, ("%s: held last inp " "reference but tw not NULL", __func__)); INP_INFO_RUNLOCK(&V_tcbinfo); continue; } This is a regression issue after: commit 5630210a7f1dbbd903b77b2aef939cd47c63da58
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
> On 08 Aug 2017, at 10:31, Hans Petter Selasky wrote: > > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, >>> >>> print *twq_2msl.tqh_first->tw_inpcb >> (kgdb) print *twq_2msl.tqh_first->tw_inpcb >> $3 = { >> inp_hash = { >> le_next = 0x0, >> le_prev = 0xfe000f78adb8 >> }, >> inp_pcbgrouphash = { >> le_next = 0x0, >> le_prev = 0x0 >> }, >> inp_list = { >> le_next = 0xf80c2a07f570, >> le_prev = 0x81e15e20 >> }, >> inp_ppcb = 0xf80d1bf12210, >> inp_pcbinfo = 0x81e15e28, >> inp_pcbgroup = 0x0, >> inp_pcbgroup_wild = { >> le_next = 0x0, >> le_prev = 0x0 >> }, >> inp_socket = 0x0, >> inp_cred = 0xf804ae6ca400, >> inp_flow = 0, >> inp_flags = 92274688, >> inp_flags2 = 16, >> inp_vflag = 0 '\0', >> inp_ip_ttl = 64 '@', >> inp_ip_p = 0 '\0', >> inp_ip_minttl = 0 '\0', >> inp_flowid = 946611505, >> inp_refcount = 2, >> inp_pspare = 0xf8031c5707c0, >> inp_flowtype = 191, >> inp_rss_listen_bucket = 0, >> inp_ispare = 0xf8031c5707f0, >> inp_inc = { >> inc_flags = 0 '\0', >> inc_len = 0 '\0', >> inc_fibnum = 0, >> inc_ie = { >> ie_fport = 53987, >> ie_lport = 47873, >> ie_dependfaddr = { >> ie46_foreign = { >> ia46_pad32 = 0xf8031c570808, >> ia46_addr4 = { >> s_addr = 3011802202 >> } >> }, >> ie6_foreign = { >> __u6_addr = { >> __u6_addr8 = 0xf8031c570808 "", >> __u6_addr16 = 0xf8031c570808, >> __u6_addr32 = 0xf8031c570808 >> } >> } >> }, >> ie_dependladdr = { >> ie46_local = { >> ia46_pad32 = 0xf8031c570818, >> ia46_addr4 = { >> s_addr = 4068705883 >> } >> }, >> ie6_local = { >> __u6_addr = { >> __u6_addr8 = 0xf8031c570818 "", >> __u6_addr16 = 0xf8031c570818, >> __u6_addr32 = 0xf8031c570818 >> } >> } >> }, >> ie6_zoneid = 0 >> } >> }, >> inp_label = 0x0, >> inp_sp = 0x0, >> inp_depend4 = { >> inp4_ip_tos = 0 '\0', >> inp4_options = 0x0, >> inp4_moptions = 0x0 >> }, >> inp_depend6 = { >> inp6_options = 0x0, >> inp6_outputopts = 0x0, >> inp6_moptions = 0x0, >> inp6_icmp6filt = 0x0, >> inp6_cksum = 0, >> inp6_hops = 0 >> }, >> inp_portlist = { >> le_next = 0xf80274298ae0, >> le_prev = 0xf800454999b0 >> }, >> inp_phd = 0xf800454999a0, >> inp_gencnt = 2119756, >> inp_lle = 0x0, >> inp_lock = { >> lock_object = { >> lo_name = 0x814e6940 "tcpinp", >> lo_flags = 90898432, >> lo_data = 0, >> lo_witness = 0x0 >> }, >> rw_lock = 18446735277871559936 >> }, >> inp_rt_cookie = 10, >> inp_rtu = { >> inpu_route = { >> ro_rt = 0x0, >> ro_lle = 0x0, >> ro_prepend = 0x0, >> ro_plen = 0, >> ro_flags = 384, >> ro_mtu = 0, >> spare = 0, >> ro_dst = { >> sa_len = 16 '\020', >> sa_family = 2 '\002', >> sa_data = 0xf8031c5708f2 "" >> } >> }, >> inpu_route6 = { >> ro_rt = 0x0, >> ro_lle = 0x0, >> ro_prepend = 0x0, >> ro_plen = 0, >> ro_flags = 384, >> ro_mtu = 0, >> spare = 0, >> ro_dst = { >> sin6_len = 16 '\020', >> sin6_family = 2 '\002', >> sin6_port = 0, >> sin6_flowinfo = 3011802202, >> sin6_addr = { >> __u6_addr = { >> __u6_addr8 = 0xf8031c5708f8 "", >> __u6_addr16 = 0xf8031c5708f8, >> __u6_addr32 = 0xf8031c5708f8 >> } >> }, >> sin6_scope_id = 0 >> } >> } >> } >> } >> (kgdb) > > Hi, > > Here is the conclusion: > > The following code is going in an infinite loop: > > >>for (;;) { >>TW_RLOCK(V_tw_lock); >>tw = TAILQ_FIRST(&V_twq_2msl); >>if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) { >>TW_RUNLOCK(V_tw_lock); >>break; >>} >>KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL", >>__func__)); >>inp = tw->tw_inpcb; >>in_pcbref(inp); >>TW_RUNLOCK(V_tw_lock); >>if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { >>INP_WLOCK(inp); >>tw = intotw(inp); >>if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in > inp->inp_flags2. I guess you have invarian
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On Tue, Aug 08, 2017 at 10:31:33AM +0200, Hans Petter Selasky wrote: > Here is the conclusion: > > The following code is going in an infinite loop: > > > > for (;;) { > > TW_RLOCK(V_tw_lock); > > tw = TAILQ_FIRST(&V_twq_2msl); > > if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) { > > TW_RUNLOCK(V_tw_lock); > > break; > > } > > KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL", > > __func__)); > > > > inp = tw->tw_inpcb; > > in_pcbref(inp); > > TW_RUNLOCK(V_tw_lock); > > > > if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { > > > > INP_WLOCK(inp); > > tw = intotw(inp); > > if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in > inp->inp_flags2. I guess you have invariants disabled, because the > KASSERT() below should have caused a panic. > > > KASSERT(tw == NULL, ("%s: held last inp " > > "reference but tw not NULL", __func__)); > > INP_INFO_RUNLOCK(&V_tcbinfo); > > continue; > > } > > This is a regression issue after: > > > commit 5630210a7f1dbbd903b77b2aef939cd47c63da58 > > Author: jch > > Date: Thu Oct 30 08:53:56 2014 + > > > > Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and > > tcp_tw_2msl_scan(). This race condition drives unplanned timewait > > timeout cancellation. Also simplify implementation by holding inpcb > > reference and removing tcptw reference counting. > > Suggested fix attached. Hmm, I am not sure, IMHO between TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? > Index: sys/netinet/tcp_timewait.c > === > --- sys/netinet/tcp_timewait.c(revision 321981) > +++ sys/netinet/tcp_timewait.c(working copy) > @@ -709,10 +709,11 @@ > INP_WLOCK(inp); > tw = intotw(inp); > if (in_pcbrele_wlocked(inp)) { > - KASSERT(tw == NULL, ("%s: held last inp " > - "reference but tw not NULL", __func__)); > INP_INFO_RUNLOCK(&V_tcbinfo); > - continue; > + if (tw == NULL) > + continue; > + else > + break; /* INP_FREED flag is set */ > } > > if (tw == NULL) { > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If you look one line up there is a pcbref ?? --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: > On 08/08/17 13:33, Slawa Olhovchenkov wrote: > > TW_RUNLOCK(V_tw_lock); > > and > > if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { > > > > `inp` can be invalidated, freed and this pointer may be invalid? > > If you look one line up there is a pcbref ?? Yes. Can different thread take this inp and freed it? May be timer thread? ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
Hi, On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, >>> >>> print *twq_2msl.tqh_first->tw_inpcb >> > > Here is the conclusion: > > The following code is going in an infinite loop: > > >> for (;;) { >> TW_RLOCK(V_tw_lock); >> tw = TAILQ_FIRST(&V_twq_2msl); >> if (tw == NULL || (!reuse && (tw->tw_time - ticks) > >> 0)) { >> TW_RUNLOCK(V_tw_lock); >> break; >> } >> KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == >> NULL", >> __func__)); >> >> inp = tw->tw_inpcb; >> in_pcbref(inp); >> TW_RUNLOCK(V_tw_lock); >> >> if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { >> >> INP_WLOCK(inp); >> tw = intotw(inp); >> if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in > inp->inp_flags2. I guess you have invariants disabled, because the > KASSERT() below should have caused a panic. > >> KASSERT(tw == NULL, ("%s: held last inp " >> "reference but tw not NULL", >> __func__)); >> INP_INFO_RUNLOCK(&V_tcbinfo); >> continue; >> } > > This is a regression issue after: > >> commit 5630210a7f1dbbd903b77b2aef939cd47c63da58 >> Author: jch >> Date: Thu Oct 30 08:53:56 2014 + >> >> Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and >> tcp_tw_2msl_scan(). This race condition drives unplanned timewait >> timeout cancellation. Also simplify implementation by holding inpcb >> reference and removing tcptw reference counting. > > Suggested fix attached. I agree we your conclusion. Just for the record, more precisely this regression seems to have been introduced with: commit b02d40ddcda08b51a49e5667e6808f5dc5ec0472 Author: fabient Date: Wed Nov 25 14:45:43 2015 + The r241129 description was wrong that the scenario is possible only for read locks on pcbs. The same race can happen with write lock semantics as well. The race scenario: - Two threads (1 and 2) locate pcb with writer semantics (INPLOOKUP_WLOCKPCB) and do in_pcbref() on it. - 1 and 2 both drop the inp hash lock. - Another thread (3) grabs the inp hash lock. Then it runs in_pcbfree(), which wlocks the pcb. They must happen faster than 1 or 2 come INP_WLOCK()! - 1 and 2 congest in INP_WLOCK(). - 3 does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 1 (or 2) gets wlock on the pcb, runs in_pcbrele_wlocked(), which doesn't report inp as freed, due to 2 (or 1) still helding extra reference on it. The thread tries to do smth with a disconnected pcb and crashes. Submitted by: emeric.pou...@stormshield.eu Reviewed by:gleb@ MFC after: 1 week Sponsored by: Stormshield Tested by: Cassiano Peixoto, Stormshield Notes: svn path=/head/; revision=291301 Before this change in_pcbrele_wlocked() returned 1 only on the last reference which is what tcp_tw_2msl_scan() currently expects. Thus good catch, and your patch looks good. I am going to just verify the other in_pcbrele_wlocked() calls in TCP stack. Thanks. -- Julien ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On 08/08/17 13:56, Slawa Olhovchenkov wrote: On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(&V_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If you look one line up there is a pcbref ?? Yes. Can different thread take this inp and freed it? May be timer thread? No, it cannot be freed while there is a ref. Some lines down the ref is dropped once the inp pointer is no longer needed. --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 217637] One TCP connection accepted TWO times
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637 Jonathan T. Looney changed: What|Removed |Added CC||j...@freebsd.org --- Comment #89 from Jonathan T. Looney --- FWIW, I have created a patch to address the behavior reported in comment #87. It maintains the RFC-compliant behavior of sending ACKs, but rate limits them. This isn't perfect, but seems better (to me) than either of the extreme alternatives (drop every packet or ACK every packet). I'd be curious to have someone try this and see if it provides effective mitigation. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 217637] One TCP connection accepted TWO times
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637 --- Comment #90 from Jonathan T. Looney --- (In reply to Jonathan T. Looney from comment #89) Forgot to include a link to the patch: https://reviews.freebsd.org/D11929 -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 217637] One TCP connection accepted TWO times
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637 --- Comment #91 from Richard Russo --- For us, this may have limited effect, because our icmplim is much higher than the default (16k), because we do want to send closed port RSTs in high volumewhen our service ports are closed. This patch should keep our system available, but we'd still be sending out a rather large number of acks on these broken connections. Given that r317208 means we only see this condition when the syncache overflows, this may be a reasonable trade off. I have been thinking that it might be useful if these acks (or other unexpected packets) would actually push the connection into one of the existing state recovery behaviors, such as immediately triggering a keepalive sequence if one wasn't already ongoing. That's certainly a larger change though. I'll get this patch up somewhere and let you know how it goes. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"