OK I submitted a Bug Report, if someone else get's a similar problem. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209351
2016-04-27 18:10 GMT-03:00 Zé Claudio Pastore <zclau...@bsd.com.br>: > Hello Ryan, > > 2016-04-27 17:28 GMT-03:00 Ryan Stone <ryst...@gmail.com>: > >> From a quick look at the vlan code, I can identify a few cases that might >> cause that counter to increment: >> >> 1) Error from the underlying ixgbe device. Does "netstat -dI ix0" show >> that the driver has been dropping packets? >> > > No, it does not increase drop counters on ix port, only on the vlan device. > > >> >> 2) Link down events on the underlying NIC. I believe that link flaps >> will be logged to /var/log/messages and dmesg; do you see anything there >> that might correspond to the time of the packet drops? >> > > No, dmesg is clean, only a couple down/up link when I actually did > disconnect the port, and no other message on /var/log/messages that grabs > my attention. > > >> >> 3) If VLAN_HWTAGGING is disabled through ifconfig on the port, then in >> theory a low memory event could cause the packet to be dropped. Does >> "netstat -m" show that "requests for mbufs denied" increasing? >> > > Here is the ifconfig -v output for the vlan6 on the 10.1-STABLE system > > vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=303<RXCSUM,TXCSUM,TSO4,TSO6> > ether a0:36:9f:2a:6d:ae > inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19 > inet6 2804:1054:bad:b1fe::1 prefixlen 64 > nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> > media: Ethernet autoselect (10Gbase-SR <full-duplex>) > status: active > vlan: 3005 parent interface: ix3 > groups: vlan > > And here it is on the 10.3-STABLE system, I dont know why the only > difference is no options were printed on the newer system, everything else > is the same. > > vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > ether a0:36:9f:2a:6d:ae > inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19 > inet6 2804:1054:bad:b1fe::1 prefixlen 64 > nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> > media: Ethernet autoselect (10Gbase-SR <full-duplex>) > status: active > vlan: 3005 parent interface: ix3 > groups: vlan > > This is the netstat -m output when system has packet loss. Denied and > delayed counters are zeroed. > > % netstat -m > 12365/21040/33405 mbufs in use (current/cache/total) > 12310/14530/26840/505076 mbuf clusters in use (current/cache/total/max) > 12310/14508 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/225/225/252538 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/74826 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/42089 16k jumbo clusters in use (current/cache/total/max) > 27711K/35220K/62931K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > > > >> >> On Wed, Apr 27, 2016 at 2:41 PM, Zé Claudio Pastore <zclau...@bsd.com.br> >> wrote: >> >>> Hello, >>> >>> On a BGP border router I help manage, we run FreeBSD 10.1-STABLE, >>> version r281235 and it works fine for several years now. >>> >>> We have around 4Gbit/s and 1.8Mpps routed on peak while per port >>> interface >>> we peak at 300Kpps. >>> >>> Our quality metrics are measured with: >>> >>> ping -s 1472 -i 0.1 <our-other-ibgp-router> >>> >>> As well as iperf bidirecional. >>> >>> This metric is similar to what Speedy Test and SIMET tests are done and >>> our >>> customers reference. >>> >>> Systems working w/o problem: >>> - 10.1-STABLE / r281235 >>> >>> Systems tested with drops: >>> - 10.2-STABLE / r292035M >>> - 10.3-STABLE / r298705 >>> - 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org) >>> - 11.0-CURRENT Melifaro Routing Branch / r297731M >>> >>> While testing, when errors happen I can see output errs on the vlan port >>> on >>> the output from "netstat -w1 -I vlan6" >>> >>> input vlan6 output >>> packets errs idrops bytes packets errs bytes colls >>> 1 0 0 66 30557 2 33310968 0 >>> 1 0 0 105 31458 3 33912219 0 >>> 2 0 0 2954 32001 8 34983986 0 >>> 1 0 0 1512 33150 6 35942558 0 >>> 1 0 0 1512 33654 4 37311862 0 >>> 1 0 0 1512 34825 3 38213793 0 >>> 3 0 0 1683 35376 4 39488912 0 >>> 5 0 0 7280 32423 3 35551869 0 >>> >>> Problems may happen under high load (~200Kpps) or low load (~30Kpps) on a >>> vlan port. The observed frame loss never happens on untagged ports, only >>> vlan related. The observed loss happens with packets sized 900 bytes and >>> above but noticeably loss rate is higher with packets close to 1400 (1472 >>> is my reference size). >>> >>> Loss rate on all listed systems different from r281235 is 9-19% with >>> ping(1) and iperf, while it's 0% on r281235. >>> >>> First I believed it to be a Intel driver error on systems newer than >>> 10.1. >>> My reference card are dual port 82599EB 10-Gigabit SFI/SFP+ Network >>> Connection (2x2 on x8 PCIe bus, total 4x10G). But yesterday I replaced >>> Intel by Chelsio T5 and the problem is still exactly the same, so it's >>> not >>> related to card vendor. >>> >>> I always test the very same hardware, I have two SSD drives in this >>> router, >>> one for the 10.1 which just runs fine and the other disk to test the >>> various versions of FreeBSD. >>> >>> Only minor loader and sysctl confs are tweaked: >>> >>> kern.hz=2000 >>> net.inet.ip.redirect=1 # do not send IP redirects >>> net.inet.ip.accept_sourceroute=0 # drop source routed packets since >>> they ca >>> net.inet.ip.sourceroute=0 # if source routed packets are >>> accepted th >>> net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on >>> initial c >>> net.inet.udp.blackhole=1 # drop udp packets destined for >>> closed soc >>> net.inet.tcp.blackhole=2 # drop tcp packets destined for >>> closed por >>> security.bsd.see_other_uids=0 >>> >>> Can anyone suggest what might be a fix/tuning for this behavior? Was >>> there >>> any relevant change on vlan code from particular revisions close to the >>> one >>> I run on 10.1 and later which would lead to such a big difference? >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" >>> >> >> > _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"