Zhang Haoyu wrote on 2014-08-29: > Hi, Yang, Gleb, Michael, > Could you help review below patch please?
I don't quite understand the background. Why ioacpi->irr is setting before EOI? It should be driver's responsibility to clear the interrupt before issuing EOI. > > Thanks, > Zhang Haoyu > >>> Hi Jason, >>> I tested below patch, it's okay, the e1000 interrupt storm disappeared. >>> But I am going to make a bit change on it, could you help review it? >>> >>>> Currently, we call ioapic_service() immediately when we find the irq >>>> is still active during eoi broadcast. But for real hardware, there's >>>> some dealy between the EOI writing and irq delivery (system bus >>>> latency?). So we need to emulate this behavior. Otherwise, for a >>>> guest who haven't register a proper irq handler , it would stay in >>>> the interrupt routine as this irq would be re-injected immediately >>>> after guest enables interrupt. This would lead guest can't move >>>> forward and may miss the possibility to get proper irq handler >>>> registered (one example is windows guest resuming from hibernation). >>>> >>>> As there's no way to differ the unhandled irq from new raised ones, >>>> this patch solve this problems by scheduling a delayed work when the >>>> count of irq injected during eoi broadcast exceeds a threshold value. >>>> After this patch, the guest can move a little forward when there's no >>>> suitable irq handler in case it may register one very soon and for >>>> guest who has a bad irq detection routine ( such as note_interrupt() >>>> in linux ), this bad irq would be recognized soon as in the past. >>>> >>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com> --- >>>> virt/kvm/ioapic.c | 47 >>>> +++++++++++++++++++++++++++++++++++++++++++++-- virt/kvm/ioapic.h | >>>> 2 ++ 2 files changed, 47 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index >>>> dcaf272..892253e 100644 --- a/virt/kvm/ioapic.c +++ >>>> b/virt/kvm/ioapic.c <at> <at> -221,6 +221,24 <at> <at> int >>>> kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) >>>> return ret; } >>>> >>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) +{ >>>> + int i, ret; + struct kvm_ioapic *ioapic = container_of(work, struct >>>> kvm_ioapic, + >>>> eoi_inject.work); + spin_lock(&ioapic->lock); >>>> + for (i = 0; i < IOAPIC_NUM_PINS; i++) { + union >>>> kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; + + if >>>> (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) + >>>> continue; + + if >>>> (ioapic->irr & (1 << i) && !ent->fields.remote_irr) + >>>> ret = >>>> ioapic_service(ioapic, i); + } + spin_unlock(&ioapic->lock); +} + >>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int >>>> vector, int trigger_mode) { <at> <at> >>>> -249,8 +267,29 <at> >>>> <at> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, >>>> int vector, >>>> >>>> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >>>> ent->fields.remote_irr = 0; >>>> - if (!ent->fields.mask && (ioapic->irr & (1 << i))) >>>> - ioapic_service(ioapic, i); >>>> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >>>> + ++ioapic->irq_eoi; >>> -+ ++ioapic->irq_eoi; >>> ++ ++ioapic->irq_eoi[i]; >>>> + if (ioapic->irq_eoi == 100) { >>> -+ if (ioapic->irq_eoi == 100) { >>> ++ if (ioapic->irq_eoi[i] == 100) { >>>> + /* >>>> + * Real hardware does not deliver the irq so >>>> + * immediately during eoi broadcast, so we need >>>> + * to emulate this behavior. Otherwise, for >>>> + * guests who has not registered handler of a >>>> + * level irq, this irq would be injected >>>> + * immediately after guest enables interrupt >>>> + * (which happens usually at the end of the >>>> + * common interrupt routine). This would lead >>>> + * guest can't move forward and may miss the >>>> + * possibility to get proper irq handler >>>> + * registered. So we need to give some breath to >>>> + * guest. TODO: 1 is too long? >>>> + */ >>>> + schedule_delayed_work(&ioapic->eoi_inject, 1); >>>> + ioapic->irq_eoi = 0; >>> -+ ioapic->irq_eoi = 0; >>> ++ ioapic->irq_eoi[i] = 0; >>>> + } else { >>>> + ioapic_service(ioapic, i); >>>> + } >>>> + } >>> ++ else { >>> ++ ioapic->irq_eoi[i] = 0; >>> ++ } >>>> } >>>> } >>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi >>> broadcast, it's possible that another interrupt's (not current eoi's >>> corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow >>> continually, and not too long, ioapic->irq_eoi will reach to 100. I >>> want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". >>> Any ideas? >>> >>> Zhang Haoyu >> >> I don't have objection to this per irq counter, but I don't see obvious >> difference. >> >> Btw, the patch has been rejected in the past since we're not sure >> whether this behaviour is architectural (or ask Intel?). You need >> convince the Gleb and other kvm maintainers for this idea first :). Best regards, Yang