Scott Long wrote:
Michael Vince wrote:
Kris Kennaway wrote:
On Tue, Nov 22, 2005 at 08:54:49PM -0800, John Polstra wrote:
On 23-Nov-2005 Kris Kennaway wrote:
I am seeing the em driver undergoing an interrupt storm whenever the
amr driver receives interrupts. In this case I was running newfs on
the amr array and em0 was not in use:
28 root 1 -68 -187 0K 8K CPU1 1 0:32 53.98%
irq16: em0
36 root 1 -64 -183 0K 8K RUN 1 0:37 27.75%
irq24: amr0
# vmstat -i
interrupt total rate
irq1: atkbd0 2 0
irq4: sio0 199 1
irq6: fdc0 32 0
irq13: npx0 1 0
irq14: ata0 47 0
irq15: ata1 931 5
irq16: em0 6321801 37187
irq24: amr0 28023 164
cpu0: timer 337533 1985
cpu1: timer 337285 1984
Total 7025854 41328
When newfs finished (i.e. amr was idle), em0 stopped storming.
MPTable: <INTEL SE7520BD22 >
This is the dreaded interrupt aliasing problem that several of us have
experienced with this chipset. High-numbered interrupts alias down to
interrupts in the range 16..19 (or maybe 16..23), a multiple of 8 less
than the original interupt.
Nobody knows what causes it, and nobody knows how to fix it.
This would be good to document somewhere so that people don't either
accidentally buy this hardware, or know what to expect when they run
it.
Kris
This is Intels latest server chipset designs and Dell are putting
that chipset in all their servers.
Luckily I haven't not seen the problem on any of my Dell servers (as
long as I am looking at this right).
This server has been running for a long time.
vmstat -i
interrupt total rate
irq1: atkbd0 6 0
irq4: sio0 23433 0
irq6: fdc0 10 0
irq8: rtc 2631238611 128
irq13: npx0 1 0
irq14: ata0 99 0
irq16: uhci0 1507608958 73
irq18: uhci2 42005524 2
irq19: uhci1 3 0
irq23: atapci0 151 0
irq46: amr0 41344088 2
irq64: em0 1513106157 73
irq0: clk 2055605782 99
Total 7790932823 379
This one just transfered over 8gigs of data in 77seconds with around
1000 simultaneous tcp connections under a load of 35. Both seem OK.
vmstat -i
interrupt total rate
irq4: sio0 315 0
irq13: npx0 1 0
irq14: ata0 47 0
irq16: uhci0 2894669 2
irq18: uhci2 977413 0
irq23: ehci0 3 0
irq46: amr0 883138 0
irq64: em0 2890414 2
cpu0: timer 2763566717 1999
cpu3: timer 2763797300 1999
cpu1: timer 2763551479 1999
cpu2: timer 2763797870 1999
Total 11062359366 8004
Mike
Looks like at least some of your interrupts are being aliased to
irq16, which just happens to be USB(uhci) in this case. Note that the
rate is
the same between irq64 and irq16, and the totals are pretty close. If
you don't need USB, I'd suggest turning it off.
Scott
Most of my Dell servers occasionally use the USB ports to serial out via
tip using a usb2serial cable with the uplcom driver and then into
another servers real serial port (sio) so its not really an option to
disable USB.
How much do you think it affects performance if the USB device is
actually rarely used.
I also have a 6-stable machine and noticed that the vmstat -i output
lists the em and usb together, but em0 isn't used at all, em2 and em3
are the active ones, it doesn't seem reasonable that my usb serial usage
would be that high for irq16 or could it be that em2 and em3 and also
going through irq16
vmstat -i
interrupt total rate
irq4: sio0 228 0
irq14: ata0 47 0
irq16: em0 uhci0 917039 11
irq18: uhci2 54823 0
irq23: ehci0 3 0
irq46: amr0 45998 0
irq64: em2 898628 11
lapic0: timer 159140889 1999
Total 161057655 2024
Mike
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"