On 20.07.2010, at 21:59, John Baldwin wrote:

>> I started narrowing the revisions down until I 
>> found out, that while on r202386 I'm still able to trigger the MCE, r202387 
>> seems to solve the problem on CURRENT:
>> 
>> http://svn.freebsd.org/viewvc/base?view=revision&revision=202387
> 
> Although this change was MFC'd, it was later disabled by default because it 
> causes issues on other machines.  I think there is a tunable you need to set 
> in loader.conf to enable it for 8.1.  Attilio (the author of that commit) 
> should know which tunable to set.

Might be this one in sys/amd64/amd64/clock.c:

----
static int lapic_allclocks = 1;
TUNABLE_INT("machdep.lapic_allclocks", &lapic_allclocks);
----

The r202387 changes put this into local_apic.c, guess it was moved later on (or 
after MFC), and that's why I couldn't find it on 8-stable. And, indeed, this 
tunable seems to be gone again in current. Testing with 
machdep.lapic_allclocks=0 right now. So far it looks very promising. I'll let 
it run overnight.

Another thing though: Today I compared verbose boot output from 8-stable and 
the current box. I saw that the ioapic sets up IRQ routing differently on these 
two systems although the hardware is the same. This seemed not so interesting 
at first, but then I noticed that 8-stable sets up two routes (to lapic0 and 
lapic2, or sometimes lapic3) for IRQ58 (mpt0), while current only uses one 
route (to lapic0).

I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box behave like 
the one running current. Indeed, this seems to have changed IRQ58 to be routed 
to lapic0 only. And the box was running for hours without showing the symptoms.

I just checked boot verbose outpout of my 8-stable box again (booted with 
machdep.lapic_allclocks=0 as mentioned above). And now it seems to have set up 
IRQ routes just like the current box (one route for IRQ58 to lapic0).

So I don't get which issue came first... If either one is ruled out, the 
problem seems to be gone. Was it the clock issue causing wrong IRQ routing 
setup which in turn causes mpt or the CPU go nuts? Or is mpt having two 
interrupt routes actually a normal thing (then why doesn't current behave this 
way?), but the mpt driver causes strange thins when operating with clock 
issues? Or have I misinterpreted something?

Here's the boot verbose output of ioapic related to interrupts 56 (em0), 57 
(em1) and 58 (mpt0):

---- 1st X4100M2 - running 8-stable (machdep.lapic_allclocks=1, MCEs can be 
reproduced easily) ----
# egrep '^ioapic' boot.normal | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 1 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 2 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 3 vector 50
----

---- 1st X4100M2 - running 8-stable (machdep.lapic_allclocks=0, test currently 
running, no MCEs so far) ----
# egrep '^ioapic' boot.lapic_allclocks0 | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 2 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 3 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
----

---- 2nd X4100M2 - running current (MCEs cannot be reproduced) ----
# dmesg | egrep '^ioapic' | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 2 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 3 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
----


Markus

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to