Greetings. --- On Mon, 8/17/09, Дмитрий Замураев <gigabyte....@gmail.com> wrote:
> From: Дмитрий Замураев <gigabyte....@gmail.com> > Subject: RE: em driver input errors > To: alexpalias-bsd...@yahoo.com > Cc: freebsd-net@freebsd.org > Date: Monday, August 17, 2009, 6:17 PM > > > >/boot/loader.conf: > >hw.em.rxd=4096 > >hw.em.txd=4096 > why you are using this > values? try default (without > this lines in loader.conf) As said in my original email, I was getting way more errors with the defaults. > > Witout the above we > were seeing way more > errors, now they are reduced, but still come in bursts of > over 1000 errors on > em0. > >Still seeing errros, > after some searching the > mailing lists we also added: > ># the four lines below > are repeated for em1, > em2, > em3 > >dev.em.0.rx_int_delay=0 > >dev.em.0.rx_abs_int_delay=0 > >dev.em.0.tx_int_delay=0 > >dev.em.0.tx_abs_int_delay=0 > try to increase > rx_int_delay to 600 and > rx_abs_int_delay to 1000, tx_*_delay without changes -> > by default > (100?) Thanks for the suggestion. From a "clean" box: dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 I reset all the values (errors still appearing), then tried your suggestion (rx_int_delay=600, rx_abs_int_delay=1000). This has reduced the number of interrupts for em0 (from about 7200/sec to around 6500/sec). After some time, I started getting errors again. But that has made me try this also: dev.em.0.tx_int_delay=600 dev.em.0.tx_abs_int_delay=1000 Meaning using your suggested values for tx too. Now em0 is seeing about 1800 interrupts/second, which is way better, but after some time I saw errors again... From the output of "netstat -nI em0 -w 5": input (em0) output packets errs bytes packets errs bytes colls 87267 0 50372599 106931 0 81598993 0 86496 0 50990332 105467 0 80064657 0 81726 3056 49876613 99080 0 73273640 0 90425 0 59172531 105299 0 77110096 0 120292 0 70369292 109597 0 78626248 0 ... a few minutes pass with zero errors ... 89646 0 56951878 111240 0 86493393 0 86031 0 53549721 108695 0 83592747 0 77760 3054 48505562 96912 0 73185576 0 87508 0 56116394 106094 0 79130608 0 89031 0 56490982 103039 0 77398567 0 What's interesting is that I'm seeing errors in a 80k packets/5 sec (so around 16k packets/s) zone, but no errors at 120k packets/5sec (24kpps). Currently, I've set the delay to 600 and abs_delay to 1000 on all interfaces (em0, em1, em2, em3), thus reducing the number of interrupts. I'm currently seeing (in systat -vmstat 2): Around 1800 irqs/s for em0, 1800 for em1, 1800 for em2, under 10/s for em3 Around 2000 irqs/s for cpu0:time, 2000 more for cpu1:time, 2000 for cpu2:time and 2000 for cpu3:time. Interrupts total (as reported by systat): around 13500/second. I would estimate the old IRQ load at around 30000-35000/second, which doesn't seem too much to me, for a dual xeon machine. > >kern.ipc.nmbclusters=655360 > no need. see netstat > -m Thanks, but as I said, I did try almost *EVERYTHING* I could without rebooting. Including this. Speaking of which, I did compile the kernel with "options DEVICE_POLLING", but enabling polling only made the errors appear more often, and in greater numbers. > P.S. change copper cable, > turn off the flow-control > (if is on) There are 4 em interfaces on this machine, with new cat6 cables. 2 more em interfaces on another machine that was seeing the same errors (the old router), on different cables. And 2 more em interfaces on another machine that's in production, also with new cables. The input errors (as debugged by sysctl dev.em.0.stats=1 -> read dmesg) are only 2 because of CRC errors, as opposed to around 2.500.000 from other causes. I tend to feel the cable isn't the problem. Flow control is off, I just checked. I forgot about that one, thanks for reminding me. Thank you for your help Alex _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"