[Bug 211031] [panic] in ng_uncallout when argument is NULL
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211031 Kubilay Kocak changed: What|Removed |Added Flags||mfc-stable10? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 211031] [panic] in ng_uncallout when argument is NULL
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211031 Kubilay Kocak changed: What|Removed |Added CC||k...@freebsd.org, ||r...@freebsd.org Version|11.0-BETA1 |CURRENT Keywords||crash, needs-qa, patch Status|New |Open URL||https://reviews.freebsd.org ||/D7209 Flags||mfc-stable11? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 210943] Page fault in ip6_setpktopts when syncthing is started with pflog loaded
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210943 Kubilay Kocak changed: What|Removed |Added Assignee|freebsd-net@FreeBSD.org |d...@freebsd.org Flags||mfc-stable10?, ||mfc-stable11? Resolution|FIXED |--- Status|Closed |In Progress --- Comment #6 from Kubilay Kocak --- Assign to committer that resolved. Re-open for MFC to stable/11, stable/10 Please set flag mfc-stable* to + if/when committed, or - if not appropriate with comment -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 199096] Kernel panic after some time using mpd (netgraph) and ipfw
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199096 Donald Baud changed: What|Removed |Added CC||donaldb...@yahoo.com --- Comment #2 from Donald Baud --- We started using mpd5.8 (netgraph l2tp) with FreeBSD RELENG-10.3 This is a replacement to a cisco7206 npeg (LNS) Terminates about 300 l2tp tunnels, 800sessions and 800Mbit/s I noticed several unresolved bug reports relating to crash with netgraph: Bug 199096 Kernel panic after some time using mpd (netgraph) and ipfw Bug 176401 [netgraph] page fault in netgraph Bug 154286 [netgraph] [panic] 8.2-PRERELEASE panic in netgraph Bug 154091 - [netgraph] [panic] netgraph, unaligned mbuf? Bug 153497 - [netgraph] netgraph panic due to race conditions I don't want to start an new PR but I'm noticing many panic's, once a day mostly: I've experimented with net.graph values but did't do any difference: # grep net.graph /etc/sysctl.conf ; grep net.graph /boot/loader.conf #net.graph.maxdgram=524288 #default 20480 #net.graph.recvspace=524288 #default 20480 #net.graph.maxdata=65536 #default 4096 #net.graph.maxalloc=65536#default 4096 # uname -a FreeBSD mybox.example.com 10.3-RELEASE-p4 FreeBSD 10.3-RELEASE-p4 #0: Sat May 28 12:23:44 UTC 2016 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 # sysctl -a | grep net.graph net.graph.mppe.max_rekey: 1000 net.graph.mppe.log_max_rekey: 1 net.graph.mppe.block_on_max_rekey: 0 net.graph.control.proto: 2 net.graph.data.proto: 1 net.graph.family: 32 net.graph.recvspace: 20480 net.graph.maxdgram: 20480 net.graph.msg_version: 8 net.graph.abi_version: 12 net.graph.maxdata: 4096 net.graph.maxalloc: 4096 net.graph.threads: 4 # kldstat Id Refs AddressSize Name 1 34 0x8020 17bc6a8 kernel 22 0x81c11000 114dbipfw.ko 31 0x81c23000 d32f dummynet.ko 41 0x81c31000 3831 ng_socket.ko 59 0x81c35000 ba02 netgraph.ko 61 0x81c41000 2b99 ng_mppc.ko 71 0x81c44000 80c rc4.ko 81 0x81c45000 23dc vmmemctl.ko 91 0x81c48000 397d ng_l2tp.ko 101 0x81c4c000 4b04 ng_ksocket.ko 111 0x81c51000 17d6 ng_tee.ko 121 0x81c53000 40d2 ng_iface.ko 131 0x81c58000 5829 ng_ppp.ko 141 0x81c5e000 18b1 ng_tcpmss.ko 151 0x81c6 2df7 ng_vjc.ko # vmstat -z | head -1 ; vmstat -z | grep NetGraph ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP NetGraph items: 72, 4123, 0,1271,96158617, 0, 0 NetGraph data items: 72, 4123, 4,1639,244942156, 0, 0 # /etc/rc.conf ipv6_network_interfaces="none" # Default is auto ip6addrctl_enable="NO" # New way to disable IPv6 support devd_enable="NO" mpd_enable="YES" quagga_enable="YES" quagga_daemons="zebra ospfd" = Crash log: Jul 14 03:20:16 mybox kernel: Fatal trap 12: page fault while in kernel mode Jul 14 03:20:16 mybox kernel: cpuid = 3; apic id = 03 Jul 14 03:20:16 mybox kernel: fault virtual address = 0x60 Jul 14 03:20:16 mybox kernel: fault code= supervisor read data, page not present Jul 14 03:20:16 mybox kernel: instruction pointer = 0x20:0x80a27d7a Jul 14 03:20:16 mybox kernel: stack pointer = 0x28:0xfe0174dcb600 Jul 14 03:20:16 mybox kernel: frame pointer = 0x28:0xfe0174dcb620 Jul 14 03:20:16 mybox kernel: code segment = base 0x0, limit 0xf, type 0x1b Jul 14 03:20:16 mybox kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Jul 14 03:20:16 mybox kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Jul 14 03:20:16 mybox kernel: current process = 656 (mpd5) Jul 14 03:20:16 mybox kernel: trap number = 12 Jul 14 03:20:16 mybox kernel: panic: page fault Jul 14 03:20:16 mybox kernel: cpuid = 3 Jul 14 03:20:16 mybox kernel: KDB: stack backtrace: Jul 14 03:20:16 mybox kernel: #0 0x8098e390 at kdb_backtrace+0x60 Jul 14 03:20:16 mybox kernel: #1 0x80951066 at vpanic+0x126 Jul 14 03:20:16 mybox kernel: #2 0x80950f33 at panic+0x43 Jul 14 03:20:16 mybox kernel: #3 0x80d55f7b at trap_fatal+0x36b Jul 14 03:20:16 mybox kernel: #4 0x80d5627d at trap_pfault+0x2ed Jul 14 03:20:16 mybox kernel: #5 0x80d558fa at trap+0x47a Jul 14 03:20:16 mybox kernel: #6 0x80d3b8d2 at calltrap+0x8 Jul 14 03:20:16 mybox kernel: #7 0x80a8161d at in_ifadownkill+0x8d Jul 14 03:20:16 mybox kernel: #8 0x80a26e70 at rn_walktree+0x80 Jul 14 03:20:16 mybox kernel: #9 0x80a8151f at in_ifadown+0x9f Jul 14 03:20:16 mybox kernel: #10 0x80a76a2c at in_control+0x76c Jul 14 03:20:16 mybox kernel: #11 0x80
Re: TCP stack lock contention with short-lived connections
Hi, On 6/28/16 12:06 PM, Julien Charbon wrote: > On 12/7/15 4:36 PM, Julien Charbon wrote: >> On 30/05/14 06:12, k simon wrote: >>> Does any plan commit and MFC to the 10-stable ? >> >> I got a bit of interest of having the performance improvements for >> short-lived TCP connections in 10-stable. Just to share the current >> status to a wider audience: >> >> - I maintain a stack of our TCP performance related patches for >> 10.2-RELENG here: >> >> https://github.com/verisign/freebsd/commits/10.2/tcp-scale > > Got more request to MFC TCP stack short-lived connection changes (see > below) in 10: > > #1 Decrease lock contention within the TCP accept case by removing > the INP_INFO lock from tcp_usr_accept > https://svnweb.freebsd.org/base?view=revision&revision=271119 > > #2 In tcp_input(), don't acquire the pcbinfo global write lock for SYN > packets targeting a listening socket. > https://svnweb.freebsd.org/base?view=revision&revision=271119 > > #3 A connection in TIME_WAIT state before calling close() actually did > not received any RST packet. > https://svnweb.freebsd.org/base?view=revision&revision=273014 > > #4 Decompose TCP INP_INFO lock to increase short-lived TCP connections > scalability > https://svnweb.freebsd.org/base?view=revision&revision=286227 > >Fix a kernel assertion issue introduced with r286227 > https://svnweb.freebsd.org/base?view=revision&revision=286443 > > #5 Make clear that TIME_WAIT timeout expiration is managed solely by > tcp_tw_2msl_scan() > https://svnweb.freebsd.org/base?view=revision&revision=286873 > > If nobody complains, I plan to MFC them in stable/10. After actually > quite a bunch of tests as I see a lot of changes that might impact these > MFCs (like TFO support, etc.). MFC over stable/10 of above commits done and tested here: https://github.com/verisign/freebsd/commits/10/tcp-scale Nothing particular, the result make stable/10 much closer to -CURRENT TCP locking-wise. Patrick and Navdeep, you might want to look (or not) at the last commit and glance if everything looks ok TFO-wise (Patrick) and Chelsio-wise (Navdeep): https://github.com/verisign/freebsd/commit/787ff2dec1ad8a7343f86c0f6e759147fc64dac8 I did the diff with -CURRENT, and it looked quite the same than -CURRENT state. Will push these commits slowly in stable/10 if all good. Thanks. -- Julien signature.asc Description: OpenPGP digital signature
proposal: splitting NIC RSS up from stack RSS
Hi, now that 11 is branched and marching on, I'd like to start pushing some more improvements/evolution into the RSS side of things. The short list feedback from people is: * it'd be nice to be able to configure per-device RSS keys on the fly; * it'd be nice to be able to configure per-device RSS bucket mappings on the fly; * it'd be nice to be able to configure per-device RSS hash configurations on the fly; * it'd be nice to be able to configure per-bucket CPU set mappings on the fly; * it'd be nice to split the RSS driver side, the RSS packet input side and the RSS stack side of things up into separate options; * UDP IPv6 RSS support would be nice (it works, but i need to test/integrate bz's v6 udp locking changes for it to really matter); * it'd be nice to scale linearly on incoming /and/ outgoing connections. Right now incoming connections are easy, but outgoing connections aren't so easy. The other big thing, mostly to be expected, is: * it'd be nice if this were better documented; * it'd be nice if we had easy examples of this stuff working, complete with library bits in base. I'm going to tidy up the NetworkRSS bits in the wiki soon and map out a roadmap for 12 with some other bits and pieces. The "can we have RSS for NICs but not for the stack, and have keys/mapping/bucket configurable" is actually a biggish thing, as that ties into people wanting to abuse things with netmap. They don't care about the rest of the stack being RSS aware; they just want to be able to control the NIC configurations from userspace and then get it completely out of the way. I'd appreciate any other feedback/comments/suggestions. If you're using RSS and you haven't told me then please let me know! thanks, -adrian ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: panic with tcp timers
Hi, On 6/20/16 11:55 AM, Julien Charbon wrote: > On 6/20/16 9:39 AM, Gleb Smirnoff wrote: >> On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: >> J> > Comparing stable/10 and head, I see two changes that could >> J> > affect that: >> J> > >> J> > - callout_async_drain >> J> > - switch to READ lock for inp info in tcp timers >> J> > >> J> > That's why you are in To, Julien and Hans :) >> J> > >> J> > We continue investigating, and I will keep you updated. >> J> > However, any help is welcome. I can share cores. >> >> Now, spending some time with cores and adding a bunch of >> extra CTRs, I have a sequence of events that lead to the >> panic. In short, the bug is in the callout system. It seems >> to be not relevant to the callout_async_drain, at least for >> now. The transition to READ lock unmasked the problem, that's >> why NetflixBSD 10 doesn't panic. >> >> The panic requires heavy contention on the TCP info lock. >> >> [CPU 1] the callout fires, tcp_timer_keep entered >> [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo); >> [CPU 2] schedules the callout >> [CPU 2] tcp_discardcb called >> [CPU 2] callout successfully canceled >> [CPU 2] tcpcb freed >> [CPU 1] unblocks... panic >> >> When the lock was WLOCK, all contenders were resumed in a >> sequence they came to the lock. Now, that they are readers, >> once the lock is released, readers are resumed in a "random" >> order, and this allows tcp_discardcb to go before the old >> running callout, and this unmasks the panic. > > Highly interesting. I should be able to reproduce that (will be useful > for testing the corresponding fix). Finally, I was able to reproduce it (without glebius fix). The trick was to really lower TCP keep timer expiration: $ sysctl -a | grep tcp.keep net.inet.tcp.keepidle: 720 net.inet.tcp.keepintvl: 75000 net.inet.tcp.keepinit: 75000 net.inet.tcp.keepcnt: 8 $ sudo bash -c "sysctl net.inet.tcp.keepidle=10 && sysctl net.inet.tcp.keepintvl=50 && sysctl net.inet.tcp.keepinit=10" Password: net.inet.tcp.keepidle: 720 -> 10 net.inet.tcp.keepintvl: 75000 -> 50 net.inet.tcp.keepinit: 75000 -> 10 Note: It will certainly close all your ssh connections to the tested server. Now I will test in order: #1. glebius fix https://svnweb.freebsd.org/base?view=revision&revision=302350 #2. rss extra fix https://reviews.freebsd.org/D7135 #3. rrs TCP Timer cleanup https://reviews.freebsd.org/D7136 My panic for reference: Fatal trap 9: general protection fault while in kernel mode cpuid = 10; apic id = 28 [root@atlas-dl360-4 ~]# instruction pointer = 0x20:0x80c346f1 stack pointer = 0x28:0xfe1f29b848b0 frame pointer = 0x28:0xfe1f29b848e0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (4)) trap number = 9 panic: general protection fault cpuid = 10 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1f29b844a0 vpanic() at vpanic+0x182/frame 0xfe1f29b84520 panic() at panic+0x43/frame 0xfe1f29b84580 trap_fatal() at trap_fatal+0x351/frame 0xfe1f29b845e0 trap() at trap+0x820/frame 0xfe1f29b847f0 calltrap() at calltrap+0x8/frame 0xfe1f29b847f0 --- trap 0x9, rip = 0x80c346f1, rsp = 0xfe1f29b848c0, rbp = 0xfe1f29b848e0 --- tcp_timer_keep() at tcp_timer_keep+0x51/frame 0xfe1f29b848e0 softclock_call_cc() at softclock_call_cc+0x19c/frame 0xfe1f29b849c0 softclock() at softclock+0x47/frame 0xfe1f29b849e0 intr_event_execute_handlers() at intr_event_execute_handlers+0x96/frame 0xfe1f29b84a20 ithread_loop() at ithread_loop+0xa6/frame 0xfe1f29b84a70 fork_exit() at fork_exit+0x84/frame 0xfe1f29b84ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1f29b84ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- -- Julien signature.asc Description: OpenPGP digital signature
Re: panic with tcp timers
On 2016-07-14 12:01, Julien Charbon wrote: Hi, On 6/20/16 11:55 AM, Julien Charbon wrote: On 6/20/16 9:39 AM, Gleb Smirnoff wrote: On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: J> > Comparing stable/10 and head, I see two changes that could J> > affect that: J> > J> > - callout_async_drain J> > - switch to READ lock for inp info in tcp timers J> > J> > That's why you are in To, Julien and Hans :) J> > J> > We continue investigating, and I will keep you updated. J> > However, any help is welcome. I can share cores. Now, spending some time with cores and adding a bunch of extra CTRs, I have a sequence of events that lead to the panic. In short, the bug is in the callout system. It seems to be not relevant to the callout_async_drain, at least for now. The transition to READ lock unmasked the problem, that's why NetflixBSD 10 doesn't panic. The panic requires heavy contention on the TCP info lock. [CPU 1] the callout fires, tcp_timer_keep entered [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo); [CPU 2] schedules the callout [CPU 2] tcp_discardcb called [CPU 2] callout successfully canceled [CPU 2] tcpcb freed [CPU 1] unblocks... panic When the lock was WLOCK, all contenders were resumed in a sequence they came to the lock. Now, that they are readers, once the lock is released, readers are resumed in a "random" order, and this allows tcp_discardcb to go before the old running callout, and this unmasks the panic. Highly interesting. I should be able to reproduce that (will be useful for testing the corresponding fix). Finally, I was able to reproduce it (without glebius fix). The trick was to really lower TCP keep timer expiration: $ sysctl -a | grep tcp.keep net.inet.tcp.keepidle: 720 net.inet.tcp.keepintvl: 75000 net.inet.tcp.keepinit: 75000 net.inet.tcp.keepcnt: 8 $ sudo bash -c "sysctl net.inet.tcp.keepidle=10 && sysctl net.inet.tcp.keepintvl=50 && sysctl net.inet.tcp.keepinit=10" Password: net.inet.tcp.keepidle: 720 -> 10 net.inet.tcp.keepintvl: 75000 -> 50 net.inet.tcp.keepinit: 75000 -> 10 Note: It will certainly close all your ssh connections to the tested server. Now I will test in order: #1. glebius fix https://svnweb.freebsd.org/base?view=revision&revision=302350 #2. rss extra fix https://reviews.freebsd.org/D7135 #3. rrs TCP Timer cleanup https://reviews.freebsd.org/D7136 My panic for reference: Fatal trap 9: general protection fault while in kernel mode cpuid = 10; apic id = 28 [root@atlas-dl360-4 ~]# instruction pointer = 0x20:0x80c346f1 stack pointer = 0x28:0xfe1f29b848b0 frame pointer = 0x28:0xfe1f29b848e0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (4)) trap number = 9 panic: general protection fault cpuid = 10 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1f29b844a0 vpanic() at vpanic+0x182/frame 0xfe1f29b84520 panic() at panic+0x43/frame 0xfe1f29b84580 trap_fatal() at trap_fatal+0x351/frame 0xfe1f29b845e0 trap() at trap+0x820/frame 0xfe1f29b847f0 calltrap() at calltrap+0x8/frame 0xfe1f29b847f0 --- trap 0x9, rip = 0x80c346f1, rsp = 0xfe1f29b848c0, rbp = 0xfe1f29b848e0 --- tcp_timer_keep() at tcp_timer_keep+0x51/frame 0xfe1f29b848e0 softclock_call_cc() at softclock_call_cc+0x19c/frame 0xfe1f29b849c0 softclock() at softclock+0x47/frame 0xfe1f29b849e0 intr_event_execute_handlers() at intr_event_execute_handlers+0x96/frame 0xfe1f29b84a20 ithread_loop() at ithread_loop+0xa6/frame 0xfe1f29b84a70 fork_exit() at fork_exit+0x84/frame 0xfe1f29b84ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe1f29b84ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- -- Julien please see also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210884 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 194109] [lor] if_lagg rmlock <-> if_addr_lock
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194109 Navdeep Parhar changed: What|Removed |Added Version|CURRENT |11.0-STABLE CC||melif...@freebsd.org, ||n...@freebsd.org, ||r...@freebsd.org, ||sbr...@freebsd.org Severity|Affects Only Me |Affects Many People --- Comment #2 from Navdeep Parhar --- Deadlocks due to this LOR are easy to reproduce. https://reviews.freebsd.org/D6845 has a proposed fix (not reviewed yet). r272211 is mentioned as the possible culprit in that review. This still occurs on 11.0-BETA1 so I've updated the version and added re@ to the CC list. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 211106] [net][fib][loopback][route] loopback route added only to interface fib, even if net.add_addr_allfibs = 1
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211106 Mark Linimon changed: What|Removed |Added Assignee|freebsd-b...@freebsd.org|freebsd-net@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
callout_drain either broken or man page needs updating
Upon updating my drm-next branch to the latest -CURRENT callout_drain returning no longer means that the function was in fact pending when it was called. This little bit of code will panic because dwork->wq is NULL, because the callout was _not_ in fact enqueued. So either it's no longer possible to reliably query if a callout was pending while clearing it and we're ok with that or glebius last commit needs some further re-work. #define del_timer_sync(timer) (callout_drain(&(timer)->timer_callout) == 1) static inline bool flush_delayed_work(struct delayed_work *dwork) { if (del_timer_sync(&dwork->timer)) linux_queue_work(dwork->cpu, dwork->wq, &dwork->work); return (flush_work(&dwork->work)); } ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: callout_drain either broken or man page needs updating
On 07/15/16 05:45, Matthew Macy wrote: glebius last commit needs some further re-work. Hi, Glebius commit needs to be backed out, at least the API change that changes the return value when calling callout_stop() when the callout is scheduled and being serviced. Simply because there is code out there, like Mattew and others have discovered that is "refcounting" on the callout_reset() and expecting that a subsequent callout_stop() will return 1 to "unref". If you consider this impossible, maybe a fourth return value is needed for CANCELLED and DRAINING . Further, getting the callouts straight in the TCP stack is a matter of doing the locking correctly, which some has called "my magic bullet" and not the return values. I've proposed in the following revision https://svnweb.freebsd.org/changeset/base/302768 to add a new callout API that accepts a locking function so that the callout code can run its cancelled checks at the right place for situations where more than one lock is needed. Consider this case: void tcp_timer_2msl(void *xtp) { struct tcpcb *tp = xtp; struct inpcb *inp; CURVNET_SET(tp->t_vnet); #ifdef TCPDEBUG int ostate; ostate = tp->t_state; #endif INP_INFO_RLOCK(&V_tcbinfo); inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); tcp_free_sackholes(tp); if (callout_pending(&tp->t_timers->tt_2msl) || !callout_active(&tp->t_timers->tt_2msl)) { Here we have custom in-house race check that doesn't affect the return value of callout_reset() nor callout_stop(). INP_WUNLOCK(tp->t_inpcb); INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; I propose the following solution: static void tcp_timer_2msl_lock(void *xtp, int do_lock) { struct tcpcb *tp = xtp; struct inpcb *inp; inp = tp->t_inpcb; if (do_lock) { CURVNET_SET(tp->t_vnet); INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); } else { INP_WUNLOCK(inp); INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } } callout_init_lock_function(&callout, &tcp_timer_2msl_lock, CALLOUT_RETURNUNLOCKED); Then in softclock_call_cc() it will look like this: CC_UNLOCK(cc); if (c_lock != NULL) { if (have locking function) tcp_timer_2msl_lock(c_arg, 1); else class->lc_lock(c_lock, lock_status); /* * The callout may have been cancelled * while we switched locks. */ Actually "CC_LOCK(cc)" should be in-front of cc_exec_cancel() to avoid races testing, setting and clearing this variable, like done in hps_head. if (cc_exec_cancel(cc, direct)) { if (have locking function) tcp_timer_2msl_lock(c_arg, 0); else class->lc_unlock(c_lock); goto skip; } >cc_exec_cancel(cc, direct) = true; skip: if ((c_iflags & CALLOUT_RETURNUNLOCKED) == 0) { if (have locking function) ... else class->lc_unlock(c_lock); } The whole point about this is to make the the cancelled check atomic. 1) Lock TCP 2) Lock CC_LOCK() 3) change callout state --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: refcnt 0 on LLE at boot....
On Thu, 07 Jul 2016 06:36:19 -0700 Larry Rosenman wrote > Thanks for that. I've added myself to the cc list, and a comment about > having 2 vmcore's. > This was introduced by 302350. It broke the return value of callout_{stop,drain}. returning 1 even if the callout system did not hold a reference. That in turn broke the following code in lltable_free: LIST_FOREACH_SAFE(lle, &dchain, lle_chain, next) { if (callout_stop(&lle->lle_timer) > 0) LLE_REMREF(lle); llentry_free(lle); } > > On 2016-07-07 08:28, Edward Tomasz NapieraĆa wrote: > > FWIW, I'm seeing this too. I've filed a PR: > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210884 > > > > On 0707T0813, Larry Rosenman wrote: > >> and now it's been up for 13+ hours. I do have both VMCORE's from the > >> 2 > >> crashes. > >> > >> > >> > >> On 2016-07-06 18:22, Larry Rosenman wrote: > >> > Got a similar crash a few minutes later. > >> > > >> > > >> > On 2016-07-06 18:17, Larry Rosenman wrote: > >> >> First boot, and I got the following panic. 2nd boot ran just fine. > >> >> > >> >> > >> >> borg.lerctr.org dumped core - see /var/crash/vmcore.0 > >> >> > >> >> Wed Jul 6 18:13:34 CDT 2016 > >> >> > >> >> FreeBSD borg.lerctr.org 11.0-ALPHA6 FreeBSD 11.0-ALPHA6 #5 r302379: > >> >> Wed Jul 6 16:59:11 CDT 2016 > >> >> r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER amd64 > >> >> > >> >> panic: bogus refcnt 0 on lle 0xf800aa941200 > >> >> > >> >> GNU gdb 6.1.1 [FreeBSD] > >> >> Copyright 2004 Free Software Foundation, Inc. > >> >> GDB is free software, covered by the GNU General Public License, and > >> >> you are > >> >> welcome to change it and/or distribute copies of it under certain > >> >> conditions. > >> >> Type "show copying" to see the conditions. > >> >> There is absolutely no warranty for GDB. Type "show warranty" for > >> >> details. > >> >> This GDB was configured as "amd64-marcel-freebsd"... > >> >> > >> >> Unread portion of the kernel message buffer: > >> >> Copyright (c) 1992-2016 The FreeBSD Project. > >> >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, > >> >> 1994 > >> >> The Regents of the University of California. All rights reserved. > >> >> FreeBSD is a registered trademark of The FreeBSD Foundation. > >> >> FreeBSD 11.0-ALPHA6 #5 r302379: Wed Jul 6 16:59:11 CDT 2016 > >> >> r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER amd64 > >> >> FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on > >> >> LLVM 3.8.0) > >> >> can't re-use a leaf (ixl_rx_miss_bufs)! > >> >> MEMGUARD DEBUGGING ALLOCATOR INITIALIZED: > >> >> MEMGUARD map base: 0xfe40 > >> >> MEMGUARD map size: 128604256 KBytes > >> >> VT(vga): resolution 640x480 > >> >> CPU: Intel(R) Xeon(R) CPU E5410 @ 2.33GHz (2327.55-MHz > >> >> K8-class CPU) > >> >> Origin="GenuineIntel" Id=0x10676 Family=0x6 Model=0x17 > >> >> Stepping=6 > >> >> > >> >> Features=0xbfebfbff > >> >> > >> >> Features2=0xce3bd > >> >> AMD Features=0x20100800 > >> >> AMD Features2=0x1 > >> >> VT-x: HLT,PAUSE > >> >> TSC: P-state invariant, performance statistics > >> >> real memory = 68719476736 (65536 MB) > >> >> avail memory = 65382842368 (62353 MB) > >> >> Event timer "LAPIC" quality 400 > >> >> ACPI APIC Table: > >> >> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs > >> >> FreeBSD/SMP: 2 package(s) x 4 core(s) > >> >> random: unblocking device. > >> >> ioapic0 irqs 0-23 on motherboard > >> >> ioapic1 irqs 24-47 on motherboard > >> >> random: entropy device external interface > >> >> netmap: loaded module > >> >> module_register_init: MOD_LOAD (vesa, 0x80f2cb40, 0) error 19 > >> >> kbd1 at kbdmux0 > >> >> vtvga0: on motherboard > >> >> cryptosoft0: on motherboard > >> >> acpi0: on motherboard > >> >> acpi0: Power Button (fixed) > >> >> unknown: I/O range not supported > >> >> cpu0: on acpi0 > >> >> cpu1: on acpi0 > >> >> cpu2: on acpi0 > >> >> cpu3: on acpi0 > >> >> cpu4: on acpi0 > >> >> cpu5: on acpi0 > >> >> cpu6: on acpi0 > >> >> cpu7: on acpi0 > >> >> hpet0: iomem 0xfed0-0xfed003ff irq > >> >> 0,8 on acpi0 > >> >> Timecounter "HPET" frequency 14318180 Hz quality 950 > >> >> Event timer "HPET" frequency 14318180 Hz quality 350 > >> >> Event timer "HPET1" frequency 14318180 Hz quality 340 > >> >> Event timer "HPET2" frequency 14318180 Hz quality 340 > >> >> atrtc0: port 0x70-0x71 on acpi0 > >> >> Event timer "RTC" frequency 32768 Hz quality 0 > >> >> attimer0: port 0x40-0x43,0x50-0x53 on acpi0 > >> >> Timecounter "i8254" frequency 1193182 Hz quality 0 > >> >> Event timer "i8254" frequency 1193182 Hz quality 100 > >> >> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 > >> >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > >>
[Bug 194109] [lor] if_lagg rmlock <-> if_addr_lock
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194109 Kubilay Kocak changed: What|Removed |Added Keywords||needs-qa, patch Flags||mfc-stable11? Status|New |Open URL||https://reviews.freebsd.org ||/D6845 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: callout_drain either broken or man page needs updating
On Thu, 14 Jul 2016 21:21:57 -0700 Hans Petter Selasky wrote > On 07/15/16 05:45, Matthew Macy wrote: > > glebius last commit needs some further re-work. > > Hi, > > Glebius commit needs to be backed out, at least the API change that > changes the return value when calling callout_stop() when the callout is > scheduled and being serviced. Simply because there is code out there, > like Mattew and others have discovered that is "refcounting" on the > callout_reset() and expecting that a subsequent callout_stop() will > return 1 to "unref". Yes. This is the cause of the "refcnt 0 on LLE at boot..." regression. -M > > If you consider this impossible, maybe a fourth return value is needed > for CANCELLED and DRAINING . > > Further, getting the callouts straight in the TCP stack is a matter of > doing the locking correctly, which some has called "my magic bullet" and > not the return values. I've proposed in the following revision > https://svnweb.freebsd.org/changeset/base/302768 to add a new callout > API that accepts a locking function so that the callout code can run its > cancelled checks at the right place for situations where more than one > lock is needed. > > Consider this case: > > > void > > tcp_timer_2msl(void *xtp) > > { > > struct tcpcb *tp = xtp; > > struct inpcb *inp; > > CURVNET_SET(tp->t_vnet); > > #ifdef TCPDEBUG > > int ostate; > > > > ostate = tp->t_state; > > #endif > > INP_INFO_RLOCK(&V_tcbinfo); > > inp = tp->t_inpcb; > > KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, > > tp)); > > INP_WLOCK(inp); > > tcp_free_sackholes(tp); > > if (callout_pending(&tp->t_timers->tt_2msl) || > > !callout_active(&tp->t_timers->tt_2msl)) { > > Here we have custom in-house race check that doesn't affect the return > value of callout_reset() nor callout_stop(). > > > INP_WUNLOCK(tp->t_inpcb); > > INP_INFO_RUNLOCK(&V_tcbinfo); > > CURVNET_RESTORE(); > > return; > > > I propose the following solution: > > > > > static void > > tcp_timer_2msl_lock(void *xtp, int do_lock) > > { > > struct tcpcb *tp = xtp; > > struct inpcb *inp; > > > > inp = tp->t_inpcb; > > > > if (do_lock) { > > CURVNET_SET(tp->t_vnet); > > INP_INFO_RLOCK(&V_tcbinfo); > > INP_WLOCK(inp); > > } else { > > INP_WUNLOCK(inp); > > INP_INFO_RUNLOCK(&V_tcbinfo); > > CURVNET_RESTORE(); > > } > > } > > > > callout_init_lock_function(&callout, &tcp_timer_2msl_lock, > CALLOUT_RETURNUNLOCKED); > > Then in softclock_call_cc() it will look like this: > > > > > CC_UNLOCK(cc); > > if (c_lock != NULL) { > > if (have locking function) > > tcp_timer_2msl_lock(c_arg, 1); > > else > > class->lc_lock(c_lock, lock_status); > > /* > > * The callout may have been cancelled > > * while we switched locks. > > */ > > Actually "CC_LOCK(cc)" should be in-front of cc_exec_cancel() to avoid > races testing, setting and clearing this variable, like done in hps_head. > > > if (cc_exec_cancel(cc, direct)) { > > if (have locking function) > > tcp_timer_2msl_lock(c_arg, 0); > > else > > class->lc_unlock(c_lock); > > goto skip; > >} > >cc_exec_cancel(cc, direct) = true; > > > > > > > > skip: > > if ((c_iflags & CALLOUT_RETURNUNLOCKED) == 0) { > > if (have locking function) > > ... > > else > > class->lc_unlock(c_lock); > > } > > The whole point about this is to make the the cancelled check atomic. > > 1) Lock TCP > 2) Lock CC_LOCK() > 3) change callout state > > --HPS > ___ > freebsd-curr...@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"