2016-04-26 15:34+0800, Peter Xu: > Hi, Jan, > > The above issue should be caused by EOI missing of level-triggered > interrupts. Before that, I was always using edge-triggered > interrupts for test, so didn't encounter this one. Would you please > help try below patch? It can be applied directly onto the series, > and should solve the issue (it works on my test vm, and I'll take it > in v5 as well if it also works for you): > > ------------------------- > > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c > @@ -281,6 +281,36 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int > size) > +/* > + * This is to satisfy the hack in Linux kernel. One hack of it is to > + * simulate clearing the Remote IRR bit of IOAPIC entry using the > + * following: > + * > + * "For IO-APIC's with EOI register, we use that to do an explicit EOI. > + * Otherwise, we simulate the EOI message manually by changing the trigger > + * mode to edge and then back to level, with RTE being masked during > + * this." > + * > + * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701) > + * > + * This is based on the assumption that, Remote IRR bit will be > + * cleared by IOAPIC hardware for edge-triggered interrupts (I > + * believe that's what the IOAPIC version 0x1X hardware does).
I thought that Linux doesn't use explicit "EOI" to IO-APIC, but relies on EOI broadcast from LAPIC -- does that change with IR? > + * So > + * if we are emulating it, we'd better do it the same here, so that > + * the guest kernel hack will work as well on QEMU. Totally. > + * Without this, level-triggered interrupts in IR mode might fail to > + * work correctly. (I don't really understand why it worked before.) > + */ > +static inline void > +ioapic_fix_edge_remote_irr(uint64_t *entry) > +{ > + if (*entry & IOAPIC_LVT_TRIGGER_MODE) { > + /* Level triggered interrupts, make sure remote IRR is zero */ > + *entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR); (You can just unconditionally zero it, edge doesn't care.) > + } > +} > + > @@ -314,6 +344,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val, > s->ioredtbl[index] &= ~0xffffffffULL; > s->ioredtbl[index] |= val; > } > + ioapic_fix_edge_remote_irr(&s->ioredtbl[index]); I think this can be done only in the else branch of (s->ioregsel & 1). (If the guest kernel does level->edge->level, then remote_irr probably should be cleared only on edge->level transition and not on level->level, but I haven't seen that in the spec ...) > ioapic_service(s); > ------------------------ > > I am still looking into guest part codes. Although the above patch > should solve the issue, there are still issues in guest codes when > IR is enabled: > > - mismatched "vector" in IOAPIC entry and IRTE entry (this is > required in vt-d spec 5.1.5.1, and required to correctly deliver > EOI broadcast I guess). See intel_irq_remapping_prepare_irte(): "required" is a way of saying that the opposite is undefined. No need to think about it in IOMMU. > - I encountered that level-triggered entries in IOAPIC is marked as > edge-triggered interrupt in APIC (which is strange)... What/where do you mean? (The only difference I know of is that level triggered vectors in LAPIC have their respective TMR bit set while edge do not.) Thanks.