On Wed, Jul 02, 2014 at 11:16:52AM +0200, Alexander Graf wrote:
Are you sure there's not just simply some irq unmasking event
after 5500ms we don't handle properly?
I poked around a bit, and the e1000 interrupt mask register is NOT the
problem (the LSC mask bit is clear at all times). If anything, maybe
the PIIX southbridge (or something further up "north") is masking PCI
interrupts (at least from e1000) until roughly 5500 ms into the boot
process ? Any ideas on how I could go about verifying this (without
access to the guest source, obviously :) ) would be very helpful...
Yeah, maybe the interrupt is masked and doesn't get delivered properly? See
if you can trace when the e1000 emulation starts kicking an interrupt and
when the guest tries to fetch it (there should be an ack register for IRQs
somewhere).
If we kick it but the guest doesn't react, the problem is further down -
check whether the IRQ ever got injected into the guest with trace points.
If we don't kick it, we mask it somewhere in the e1000 emulation and need to
make sure we do kick once we unmask :). I don't know whether the LSC mask is
the only one involved.
So it turns out everything I thought I knew (which was little indeed)
was more or less wrong. The problem, as far as I'm observing it now,
is that on PIIX, the OS X guest obsessively reads the ICR in a tight
loop. It reads the injected LSC (and probably discards it) before
unmasking the corresponding interrupt bit; later on, when it unmasks
LSC, giving the emulated e1000 hardware a chance to raise the irq
line, the actual LSC event has been flushed from the ICR, and the
driver does not detect the link coming up.
Here's how things work normally on Q35 (with INTERRUPT and PHY debugging
enabled, // my comments on the side):
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: Start link auto negotiation
e1000: Auto negotiation is completed
e1000: set_ics 4, ICR 0, IMR 0 // autoneg timer injects LSC
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 4, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 6, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df // guest unmasks interrupts
e1000: set_ics 0, ICR 6, IMR df
e1000: set_interrupt_cause: mit_irq_level=1 // first raising irq edge
e1000: ICR read: 6 // guest receives LSC (+more)
... and things work nicely from here on out :)
On PIIX, however, things look like this:
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
... <8155 "ICR read" repetitions deleted> ...
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc ffffffff
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: Start link auto negotiation
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
... <145 "ICR read" repetitions deleted> ...
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: Auto negotiation is completed
e1000: set_ics 4, ICR 0, IMR 0 // autoneg timer injects LSC
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 4 // guest reads (and discards?) LSC
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
... <23 "ICR read" repetitions deleted> ...
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 2
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 2, ICR 0, IMR 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df // guest unmasks interrupts
e1000: set_ics 0, ICR 2, IMR df
e1000: set_interrupt_cause: mit_irq_level=1 // first raising irq edge
e1000: ICR read: 2 // too late, LSC is gone :(
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc 2
e1000: set_ics 0, ICR 0, IMR dd
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df
e1000: set_ics 0, ICR 0, IMR df
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ics 80, ICR 0, IMR df
e1000: set_interrupt_cause: mit_irq_level=1
e1000: set_ics 3, ICR 80, IMR df
e1000: set_interrupt_cause: mit_irq_level=1
e1000: ICR read: 83
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_imc 82
e1000: set_ics 0, ICR 0, IMR 5d
e1000: set_interrupt_cause: mit_irq_level=0
e1000: set_ims df
e1000: set_ics 0, ICR 0, IMR df
e1000: set_interrupt_cause: mit_irq_level=0
e1000: ICR read: 0
e1000: set_interrupt_cause: mit_irq_level=0
... and so on...
Any clue as to why ICR gets read like that on PIIX, but not Q35 ?