On Tue, May 7, 2013 at 4:06 PM, Xin Li <delp...@delphij.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 05/07/13 15:03, Garrett Cooper wrote: >> Saw the following LOR on a CURRENT build as of yesterday with an >> almost idle machine processing ARP requests: >> >> root@wf220:/mnt # taskqueue_drain with the following non-sleepable >> locks held: exclusive rw lle (lle) r = 0 (0xfffffe001450b410) >> locked @ /usr/src/sys/netinet/in.c:1484 KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xffffff848d4f7690 kdb_backtrace() at kdb_backtrace+0x39/frame >> 0xffffff848d4f7740 witness_warn() at witness_warn+0x4a8/frame >> 0xffffff848d4f7800 taskqueue_drain() at taskqueue_drain+0x3a/frame >> 0xffffff848d4f7840 set_timeout() at set_timeout+0x4a/frame >> 0xffffff848d4f7860 netevent_callback() at >> netevent_callback+0x16/frame 0xffffff848d4f7870 arpintr() at >> arpintr+0x9b5/frame 0xffffff848d4f7930 netisr_dispatch_src() at >> netisr_dispatch_src+0x60/frame 0xffffff848d4f79a0 ether_demux() at >> ether_demux+0x130/frame 0xffffff848d4f79d0 ether_nh_input() at >> ether_nh_input+0x369/frame 0xffffff848d4f7a30 netisr_dispatch_src() >> at netisr_dispatch_src+0x60/frame 0xffffff848d4f7aa0 em_rxeof() at >> em_rxeof+0x30e/frame 0xffffff848d4f7b10 em_msix_rx() at >> em_msix_rx+0x33/frame 0xffffff848d4f7b40 >> intr_event_execute_handlers() at >> intr_event_execute_handlers+0x80/frame 0xffffff848d4f7b70 >> ithread_loop() at ithread_loop+0x128/frame 0xffffff848d4f7bb0 >> fork_exit() at fork_exit+0x71/frame 0xffffff848d4f7bf0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xffffff848d4f7bf0 >> --- trap 0, rip = 0, rsp = 0xffffff848d4f7cb0, rbp = 0 --- >> root@wf220:/mnt # uname -a FreeBSD wf220.west.isilon.com >> 10.0-CURRENT FreeBSD 10.0-CURRENT #1: Tue May 7 08:04:59 PDT 2013 >> r...@wf220.west.isilon.com:/usr/obj/usr/src/sys/ISI-GENERIC amd64 >> >> I've seen this issue before for a few weeks/months, so it's nothing >> new (but probably should be fixed...). Thanks! > > This have nothing to do with em(4) but looks like a bug in our Linux > compatibility wrapper. In the InfiniBand code, its > _handle_arp_update_event() calls netevent_callback() with > NETEVENT_NEIGH_UPDATE, where a cancel_delayed_work() causes the drain. > > Looking at the Linux code, it seems that we just shouldn't do the > drain in the cancel_delayed_work() wrapper > (sys/ofed/include/linux/workqueue.h) so it seems like we need > something like this: > > Index: sys/ofed/include/linux/workqueue.h > =================================================================== > - --- sys/ofed/include/linux/workqueue.h (revision 250337) > +++ sys/ofed/include/linux/workqueue.h (working copy) > @@ -184,9 +184,9 @@ > { > > callout_stop(&work->timer); > - - if (work->work.taskqueue && > - - taskqueue_cancel(work->work.taskqueue, &work->work.work_task, > NULL)) > - - taskqueue_drain(work->work.taskqueue, &work->work.work_task); > + if (work->work.taskqueue) > + return (taskqueue_cancel(work->work.taskqueue, > + &work->work.work_task, NULL) != 0); > return 0; > } > > > > I've added Jeff to Cc.
The patch LGTM (I haven't hit the issue after 10 minutes of use; generally it pops up almost immediately after boot or within the first couple of minutes). Thanks a million! -Garrett _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"