on 21/07/2010 15:25 Markus Gebert said the following: > On 21.07.2010, at 10:33, Andriy Gapon wrote: > >> on 21/07/2010 03:57 Markus Gebert said the following: >>> Another thing though: Today I compared verbose boot output from 8-stable >>> and the current box. I saw that the ioapic sets up IRQ routing differently >>> on these two systems although the hardware is the same. This seemed not so >>> interesting at first, but then I noticed that 8-stable sets up two routes >>> (to lapic0 and lapic2, or sometimes lapic3) for IRQ58 (mpt0), while current >>> only uses one route (to lapic0). >> My understanding that it's not "two routes", but re-routing. During early >> boot all interrupts are bound to BSP; later, when APs become online, the >> interrupts are re-distributed among available CPUs. > > I guess you're right, misinterpretation on my side. Thanks for clarifying > this. > > > Now being aware of this, it seems to me that in the machdep.lapic_allclocks=0 > case, there might just be more interrupts to be assigned/routed due to "more > clocks being used". If that's true, maybe it's just "luck" that in this case > the mpt interrupt gets assigned to lapic0/cpu0 and the box runs fine. I'm just > guessing though, since I have no clue how interrupts are assigned to lapics > exactly (round-robin? some logic?).
Yes, round-robin, for interrupts that not explicitly bound to specific CPUs. The process is deterministic, but hard to predict indeed. >>> I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box behave >>> like the one running current. Indeed, this seems to have changed IRQ58 to >>> be routed to lapic0 only. And the box was running for hours without showing >>> the symptoms. >>> >>> I just checked boot verbose outpout of my 8-stable box again (booted with >>> machdep.lapic_allclocks=0 as mentioned above). And now it seems to have set >>> up IRQ routes just like the current box (one route for IRQ58 to lapic0). >> Not sure how to interpret this properly. One possibility is a hardware >> problem where interrupt message route between ioapic2 and CPU to which lapic3 >> belongs is flaky. Perhaps, this might be a FreeBSD problem: it could be that >> the system somehow tells to not set up such routes, but we don't listen. But >> this is far fetched. > > > I'm not sure either. If my "theory" above proved to be true, it would have > been > just luck, that 6.x and 7.x (and current) run just fine on the X4100M2. A > (short) test on Ubuntu didn't trigger the problem, so the Linux kernel is > either lucky too by selecting an interrupt route that is "not flaky", or > there's indeed some way to figure out not to use some lapics for some > interrupts. Or we didn't test Linux thoroughly enough. Yep, it would be interesting to see how interrupts were distributed among CPUs on that Linux. -- Andriy Gapon _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"