Hi, On Fri, Oct 7, 2011 at 3:24 PM, Mike Tancsa <m...@sentex.net> wrote: > On 10/7/2011 2:59 PM, Jason Wolfe wrote: >> Mike, >> >> I had a large pool of servers running 7.2.3 with MSI-X enabled during my >> testing, but it didn't resolve the issue. I just pulled back the >> sys/dev/e1000 directory from 8-STABLE and ran it on 8-RELEASE-p2 though, so >> if there were changes made outside of the actual driver code that helped I >> may have not seen the benefit. It's possible the lagg is adding some >> complication, but when one of the interfaces wedge the lagg continues to >> operate over the other link (though half of the traffic simply fails). It >> appears the interface just runs out of one of its buffers, and is helpless >> to resolve it without a bounce. >> >> I do recall coming across the ASPM threads, but my Supermicro boards didn't >> have the option and many people claimed it didn't resolve it, so I didn't >> follow through. I'll do a bit more digging there, thanks. >> >> Disabling MSI-X has without a doubt completely resolved my problem though. I >> would receive about 30 reports/failures a day from my servers when I was >> running with it, since disabling it I haven't received a single one in ~40 >> days. The servers are currently running with the 7.2.3 driver also, so if >> nothing jumps out from my original email I'm happy to re enable it on a >> handful of servers and collect some fresh reports. > > Hi Jason, > This sounds like a real drag :( You certainly have WAY more servers to > sample from than I do/did (a couple). The problem on my boxes were not > very frequent to start with, so it would take a while. But the symptoms > were very similar in that I would see queue overruns in the stats when > things were wedged. I have other em nics (non 82574) that get the odd > overrun when they are busy, but they seem to recover from the situation > just fine. The 82574 did not. > > When you disable MSI-X, you mean via hw.pci.enable_msix=0 across the > board, or you disable multi-queue for the NIC, so it uses just one > interrupt, rather than separate ones for xmit and recv ? > em(4)'s multiqueue is misleading. By default, with MSI-X enabled, before AFAIK, April 2010 it used 2 (RX+TX) queue + 1, ie. 5 MSI-X vectors[0]. After April 2010, it uses 1 * (RX+TX) queue + 1, ie. 3 MSI-X vectors. There is no logic for the driver to use 1 vector with MSI-X enabled.
As a side note, the only gain of EM_MULTIQUEUE, now, is to allow the driver to use the buf_ring(9) lockless queue API, compared to the locked ifq. Today, em(4) should waste about 16k of memory for when !EM_MULTIQUEUE. This is the memory, 4096 * sizeof(void *), allocated for the buf_ring(9) structure which is not used in the !EM_MULTIQUEUE case. > Also, what is the purpose of > hw.pci.do_power_nodriver=3 vs 0 (3 means put absolutely everything > in D3 state.) > > net.link.ifqmaxlen 1024 vs 50 (does anything else need to be adjusted of > this value is increased?) > He might as well try to enable EM_MULTIQUEUE. > hw.em.rxd="2048" > hw.em.txd="2048" > As it starts to be well known here, I am not a fan of bumping a limit to hide a bug. So I'd rather lower this to 512 or 256, and hope it triggers the issue more often, so that it could be diagnosticed and fixed for good. - Arnaud [0]: actually it depends on a field in the chip NVM, which can be up to 4 (0 based accounting, this would translate in 5 vectors), but happened to be 2 (3 vector) in 82574 I've got access to. Last time I checked, this setting could not be seen with the standard NVM dump sysctl, which limit the output's size. On those chip, the pre-April-2010 code would falls back on MSI even if 3 were available. > Have you tried leaving these two at the default on 7.2.3 ? > if_em.h implies 1024 for each. > > ---Mike > > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, m...@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"