Oliver Brandmueller wrote:
Hi,
On Wed, Sep 27, 2006 at 08:00:21AM +0200, Martin Nilsson wrote:
I get tons of these:
em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP
mailbox# pciconf -lv
[EMAIL PROTECTED]:0:0: class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03
hdr=0x00
vendor = 'Intel Corporation'
device = 'PRO/1000 PM'
class = network
subclass = ethernet
[EMAIL PROTECTED]:0:0: class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00
hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
[...]
I have only seen them on em0. Yesterday I tried sysutils/cpuburn on
similar boxes that are netbooted with NFS mounted drives and everytime I
loaded the two CPU cores the network went down.
I see the same.
Very much on this one, where I workaround the problem by using polling,
it's a UP machine.
FreeBSD nessie 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Fri Sep 15 09:48:36
CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NESSIE i386
[EMAIL PROTECTED]:1:0: class=0x020000 card=0x10198086 chip=0x10198086
rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82547EI Gigabit Ethernet Controller (LOM)'
class = network
subclass = ethernet
irq18: em0 uhci2 3319 0
Another machine, also UP, but with two interfaces. The problem is not as
apparent as on the first machine, but it's there. This machine is not as
loaded usually (CPU wise) as the first machine. The problem is ONLY on
em1:
FreeBSD hudson 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #48: Thu Sep 14 10:19:46
CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6 i386
[EMAIL PROTECTED]:1:0: class=0x020000 card=0x10758086 chip=0x10758086
rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82547EI Gigabit Ethernet Controller'
class = network
subclass = ethernet
[EMAIL PROTECTED]:2:0: class=0x020000 card=0x10768086 chip=0x10768086
rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82547EI Gigabit Ethernet Controller'
class = network
subclass = ethernet
irq17: em1 ichsmb0 950121879 855
irq18: em0 71437344 64
The problem appeared after the em updates during the last weeks in the
kernel and has not been observed before this. em is always loaded as a
module in my kernels. The problem seems to occur more often if the
machine's CPU is busy.
I have several SMP machines with the following em interfaces, which
DON'T show the problem, but they also have different chipset on the em
interface. Most of the kernels were built between Sep 7 and Sep 19.
3 times this:
[EMAIL PROTECTED]:5:0: class=0x020000 card=0x34248086 chip=0x10108086
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:5:1: class=0x020000 card=0x34248086 chip=0x10108086
rev=0x01 hdr=0x00
irq23: em0 970303432 750
3 times this:
[EMAIL PROTECTED]:5:0: class=0x020000 card=0x34258086 chip=0x100e8086
rev=0x02 hdr=0x00
irq23: em0 292477376 435
So I can observe at least 3 interesting differences:
- the interface showing the problems shares the interrupt
- for me it happens on UP machines only
- the chips are different
What I can't do: moving the interfaces between machines, these are
onboard interfaces.
What I could do: I could try to unload the USB driver or the ichsmb
driver on the machiens, where the interrupts are shared. Anyway, the USB
is not used currently (I have it enabled to be prepared to hook up a USB
Mass Storage device, which never happend since the problem occured). The
ichsmb also is usually not queried.
Any suggestions on how I could help?
- Olli
Well, the best I can say at the moment is, "Wow." =-( I guess the
thing to do here is to figure out if the problem lies with the em
interrupt handler not getting run, or the taskqueue not getting run.
Since you've stated that it seems to be related to shared interrupts,
the first possibility is more likely. However, I'm not sure why the
symptom would only be showing up now. The Intel docs say that the
82547EI are a bit interesting, and I wonder if assumptions that we
make about PCI ordering aren't true (or if there are bugs that make
our assumptions invalid).
Does this happen after there has been a lot of disk activity, like a
large tar extraction? Are you using the SMBus interface at all, or is
it sitting completely idle?
Scott
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"