Re: em driver input errors

Artis Caune Fri, 04 Sep 2009 06:02:03 -0700

2009/8/1  <alexpalias-bsd...@yahoo.com>:
> Good day
>
> I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on one 
> of the em interfaces (em0), coupled with (at approximately the same times) 
> much fewer errors on em1 and em2.  Monitoring is done with SNMP from another 
> machine, and the CPU load as reported via SNMP is mostly below 30%, with a 
> couple of spikes up to 35%.
>
> Software description:
>
> - FreeBSD 7.2-RELEASE-p2, amd64
> - bsnmpd with modules: hostres and (from ports) snmp_ucd
> - quagga 0.99.12 (running only zebra and bgpd)
> - netgraph (ng_ether and ng_netflow)
>
> Hardware description:
>
> - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM
> - 2 x built-in gigabit interfaces (em0, em1)
> - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the end]
>
>
> The machine receives the global routing table ("netstat -nr | wc -l" gives 
> 289115 currently).
>
> All of the em interfaces are just configured "up", with various vlan 
> interfaces on them.  Note that I use "kpps" to mean "thousands of packets per 
> second", sorry if that's the wrong shorthand.
>
> - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out.  In bits, it's 
> 30...120Mbits/s in, and 100...210Mbits/s out.  Vlans configured are vlan100 
> and vlan200, and most of the traffic is on vlan100 (vlan200 sees 4kpps in / 
> 0.5kpps out maximum, with the average at about one third of this).  em0 is 
> the external interface, and its traffic corresponds to the sum of traffic 
> through em1 and em2
>
> - em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum)
>
> - em2 has a single VLAN, and sees about 4...13kpps both in and out (almost 
> equal in/out during most of the day)
>
> - em3 is a backup interface, with 2 VLANS, and is the only one which has seen 
> no errors.
>
> Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seeing 
> have started appearing days before netgraph was even loaded in the kernel.
>
> Tuning done:
>
> /boot/loader.conf:
> hw.em.rxd=4096
> hw.em.txd=4096
>
> Witout the above we were seeing way more errors, now they are reduced, but 
> still come in bursts of over 1000 errors on em0.
>
> /etc/sysctl.conf:
> net.inet.ip.fastforwarding=1
> dev.em.0.rx_processing_limit=300
> dev.em.1.rx_processing_limit=300
> dev.em.2.rx_processing_limit=300
> dev.em.3.rx_processing_limit=300
>
> Still seeing errros, after some searching the mailing lists we also added:
>
> # the four lines below are repeated for em1, em2, em3
> dev.em.0.rx_int_delay=0
> dev.em.0.rx_abs_int_delay=0
> dev.em.0.tx_int_delay=0
> dev.em.0.tx_abs_int_delay=0
>
> Still getting errors, so I also added:
>
> net.inet.ip.intr_queue_maxlen=4096
> net.route.netisr_maxqlen=1024
>
> and
>
> kern.ipc.nmbclusters=655360
>
>
> Also tried with rx_processing_limit set to -1 on all em interfaces, still 
> getting errors.
>
> Looking at the shape of the error and packet graphs, there seems to be a 
> correlation between the number of packets per second on em0 and the height of 
> the error "spikes" on the error graph.  These spikes are spread throughout 
> the day, with spaces (zones with no errors) of various lengths (10 minutes 
> ... 2 hours spaces within the last 24 hours), but sometimes there are errors 
> even in the lowest kpps times of the day.
>
> em0 and em1 error times are correlated, with all errors on the graph for em0 
> having a smaller corresponding error spike on em1 at the same time, and 
> sometimes an error spike on em2.
>
> The old router was seeing about the same traffic, and had em0, em1, re0 and 
> re1 network cards, and was only seeing errors on the em cards.  It was 
> running 7.2-PRERELEASE/i386
>
>
> Any suggestions would be greatly appreciated.  Please note that this is a 
> live router, and I can't reboot it (unless absolutely necessary).  Tuning 
> that can be applied without rebooting will be tried first.



Is it still actual?
You didn't mention if you are using pf or other firewall.
I have similar problem with two boxes replicating zfs pools, when I
noticed input errors.
After some investigation turns out it was pf overhead, even though I
was skipping on interfaces where zfs sedn/recv.

With pf enables (and skip) I can copy 50-80MB/s with 50-80Kpps and
0-100+ input drops per second.
With pf disabled I can copy constantly with 102 or 93 MB/s and
110-131Kpps, few drops (because 1 CPU almost eaten).





-- 
Artis Caune

    Everything should be made as simple as possible, but not simpler.
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: em driver input errors

Reply via email to