On 27/04/19 07:29, Paolo Bonzini wrote: > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD >>> resample feature, but vfio never gets the unmask notification, so the >>> device remains with DisINTx set and no further interrupts are >>> generated. Do we expect KVM's IRQFD with resampler to work in the >>> split IRQ mode? We can certainly hope that "high performance" devices >>> use MSI or MSI/X, but this would be quite a performance regression with >>> split mode if our userspace bypass for INTx goes away. Thanks, >> >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via >> kvm_notify_acked_gsi(), > > That wouldn't help because kvm_ioapic_update_eoi would not even be > able to access vcpu->kvm->arch.vioapic (it's NULL). > > The following untested patch would signal the resamplefd in > kvm_ioapic_send_eoi, > before requesting the exit to userspace. However I am not sure how QEMU > sets up the VFIO eventfds: if I understand correctly, when VFIO writes again > to > the irq eventfd, the interrupt request would not reach the userspace IOAPIC, > but > only the in-kernel LAPIC. That would be incorrect, and if my understanding is > correct we need to trigger resampling from hw/intc/ioapic.c.
Actually it's worse: because you're bypassing IOAPIC when raising the irq, the IOAPIC's remote_irr for example will not be set. So split irqchip currently must disable the intx fast path completely. I guess we could also reimplement irqfd and resamplefd in the userspace IOAPIC, and run the listener in a separate thread (using "-object iothread" on the command line and AioContext in the code). Paolo > > Something like: > > 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly > > 2) add a NotifierList in kvm_irqchip_assign_irqfd > > 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c > which > stores the resamplefd as an EventNotifier* > > 4) ioapic_eoi_broadcast writes to the resamplefd > > BTW, how do non-x86 platforms handle intx resampling? > > Paolo > > ---- 8< ----- > From: Paolo Bonzini <pbonz...@redhat.com> > Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes > > diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h > index ea1a4e0297da..be337e06e3fd 100644 > --- a/arch/x86/kvm/ioapic.h > +++ b/arch/x86/kvm/ioapic.h > @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, > ulong *ioapic_handled_vectors); > void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, > ulong *ioapic_handled_vectors); > +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector); > + > #endif > diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c > index 3cc3b2d130a0..46ea79a0dd8a 100644 > --- a/arch/x86/kvm/irq_comm.c > +++ b/arch/x86/kvm/irq_comm.c > @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, > srcu_read_unlock(&kvm->irq_srcu, idx); > } > > +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector) > +{ > + struct kvm_kernel_irq_routing_entry *entry; > + struct kvm_irq_routing_table *table; > + u32 i, nr_ioapic_pins; > + int idx; > + > + idx = srcu_read_lock(&kvm->irq_srcu); > + table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu); > + nr_ioapic_pins = min_t(u32, table->nr_rt_entries, > + kvm->arch.nr_reserved_ioapic_pins); > + > + for (i = 0; i < nr_ioapic_pins; i++) { > + hlist_for_each_entry(entry, &table->map[i], link) { > + struct kvm_lapic_irq irq; > + > + if (entry->type != KVM_IRQ_ROUTING_MSI) > + continue; > + > + kvm_set_msi_irq(kvm, entry, &irq); > + if (irq.level && irq.vector == vector) > + kvm_notify_acked_gsi(kvm, i); > + } > + } > + > + srcu_read_unlock(&kvm->irq_srcu, idx); > +} > + > void kvm_arch_irq_routing_update(struct kvm *kvm) > { > kvm_hv_irq_routing_update(kvm); > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 9f089e2e09d0..8f8e76d77925 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic > *apic, int vector) > > static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) > { > + struct kvm_vcpu *vcpu = apic->vcpu; > int trigger_mode; > > /* Eoi the ioapic only if the ioapic doesn't own the vector. */ > @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic > *apic, int vector) > return; > > /* Request a KVM exit to inform the userspace IOAPIC. */ > - if (irqchip_split(apic->vcpu->kvm)) { > - apic->vcpu->arch.pending_ioapic_eoi = vector; > - kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu); > + if (irqchip_split(vcpu->kvm)) { > + kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector); > + vcpu->arch.pending_ioapic_eoi = vector; > + kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu); > return; > } > > @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, > int vector) > else > trigger_mode = IOAPIC_EDGE_TRIG; > > - kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode); > + kvm_ioapic_update_eoi(vcpu, vector, trigger_mode); > } > > static int apic_set_eoi(struct kvm_lapic *apic) > > >> so it looks like KVM really ought to return an >> error when trying to register a resample IRQFD when irqchip_split(), >> but do we have better options? EOI handling in QEMU is pretty much >> non-existent and even if it was in place, it's a big performance >> regression for vfio INTx handling. Is a split irqchip that retains >> performant resampling (EOI) support possible? Thanks,