Hi, @Kristof: The current value of 'net.link.epair.netisr_maxqlen' is 2100, I will make it 210. Will this require a reboot? or can I just change the sysctl and reload the epair module? @Bjoern: here is the output to 'netstat -Q' ``` # netstat -Q Configuration: Setting Current Limit Thread count 1 1 Default queue limit 256 10240 Dispatch policy direct n/a Threads bound to CPUs disabled n/a Protocols: Name Proto QLimit Policy Dispatch Flags ip 1 256 flow default --- igmp 2 256 source default --- rtsock 3 256 source default --- arp 4 256 source default --- ether 5 256 source direct --- ip6 6 256 flow default --- epair 8 2100 cpu default CD- Workstreams: WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled 0 0 ip 0 30 11409267 0 0 13574317 24983409 0 0 igmp 0 0 0 0 0 0 0 0 0 rtsock 0 1 0 0 0 42 42 0 0 arp 0 0 61109751 0 0 0 61109751 0 0 ether 0 0 115098020 0 0 0 115098020 0 0 ip6 0 10 36157577 0 0 4273274 40430846 0 0 epair 0 2100 0 0 210972 303785724 303785724 ``` I still have access to a machine in this state, but will need to reset it to a working state soon. Please let me know if there is any information you would like me to get from this machine before I reset it. Best, Reshad
On 27 March 2018 8:18:29 PM IST, "Bjoern A. Zeeb" <bzeeb-li...@lists.zabbadoz.net> wrote: >On 27 Mar 2018, at 14:40, Kristof Provost wrote: > >> (Re-cc freebsd-net, because this is useful information) >> >> On 27 Mar 2018, at 13:07, Reshad Patuck wrote: >>> The epair crash occurred again today running the epair module code >>> with the added dtrace sdt providers. >>> >>> Running the same command as last time, 'dtrace -n ::epair\*:' >returns >>> the following: >>> ``` >>> CPU ID FUNCTION:NAME >> … >>> 0 66499 epair_transmit_locked:enqueued >>> ``` >> >>> Looks like its filled up a queue somewhere and is dropping >>> connections post that. >>> >>> The value of the 'error' is 55 I can see both the ifp and m structs >>> but don't know what to look for in them. >>> >> That’s useful. Error 55 is ENOBUFS, which in IFQ_ENQUEUE() means >> we’re hitting _IF_QFULL(). >> There don’t seem to be counters for that drop though, so that makes >> it hard to diagnose without these extra probe points. >> It also explains why you don’t really see any drop counters >> incrementing. >> >> The fact that this queue is full presumably means that the other side > >> is not reading packets off it any more. >> That’s supposed to happen in epair_start_locked() (Look for the >> IFQ_DEQUEUE() calls). >> >> It’s not at all clear to my how, but it looks like the receive side >> is not doing its work. >> >> It looks like the IFQ code is already a fallback for when the netisr >> queue is full. >> That code might be broken, or there might be a different issue that >> will just mean you’ll always end up in the same situation, >> regardless of queue size. >> >> It’s probably worth trying to play with >> ‘net.route.netisr_maxqlen’. I’d recommend *lowering* it, to see >> if the problem happens more frequently that way. If it does it’ll be >> helpful in reproducing and trying to fix this. If it doesn’t the >> full queues is probably a consequence rather than a cause/trigger. >> (Of course, once you’ve confirmed that lowering the netisr_maxqlen >> makes the problem more frequent go ahead and increase it.) > >netstat -Q will be useful >_______________________________________________ >freebsd-net@freebsd.org mailing list >https://lists.freebsd.org/mailman/listinfo/freebsd-net >To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"