On Sat, 27 Apr 2019 10:09:51 +0200 Paolo Bonzini <pbonz...@redhat.com> wrote:
> On 27/04/19 07:29, Paolo Bonzini wrote: > > > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD > >>> resample feature, but vfio never gets the unmask notification, so the > >>> device remains with DisINTx set and no further interrupts are > >>> generated. Do we expect KVM's IRQFD with resampler to work in the > >>> split IRQ mode? We can certainly hope that "high performance" devices > >>> use MSI or MSI/X, but this would be quite a performance regression with > >>> split mode if our userspace bypass for INTx goes away. Thanks, > >> > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via > >> kvm_notify_acked_gsi(), > > > > That wouldn't help because kvm_ioapic_update_eoi would not even be > > able to access vcpu->kvm->arch.vioapic (it's NULL). > > > > The following untested patch would signal the resamplefd in > > kvm_ioapic_send_eoi, > > before requesting the exit to userspace. However I am not sure how QEMU > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes > > again to > > the irq eventfd, the interrupt request would not reach the userspace > > IOAPIC, but > > only the in-kernel LAPIC. That would be incorrect, and if my understanding > > is > > correct we need to trigger resampling from hw/intc/ioapic.c. > > Actually it's worse: because you're bypassing IOAPIC when raising the > irq, the IOAPIC's remote_irr for example will not be set. So split > irqchip currently must disable the intx fast path completely. > > I guess we could also reimplement irqfd and resamplefd in the userspace > IOAPIC, and run the listener in a separate thread (using "-object > iothread" on the command line and AioContext in the code). This sounds like a performance regression vs KVM irqchip any way we slice it. Was this change a mistake? Without KVM support, the universal support in QEMU kicks in, where device mmaps are disabled when an INTx occurs, forcing trapped access to the device, and we assume that the next access is in response to an interrupt and trigger our own internal EOI and re-enable mmaps. A timer acts as a catch-all. Needless to say, this is functional but not fast. It would be a massive performance regression for devices depending on INTx and previously using the KVM bypass to switch to this. INTx is largely considered a legacy interrupt, so non-x86 archs don't encounter it as often, S390 even explicitly disables INTx support. ARM and POWER likely just don't see a lot of these devices, but nearly all devices (except SR-IOV VFs) on x86 expect an INTx fallback mode and some drivers may run the device in INTx for compatibility. This split irqchip change was likely fine for "enterprise" users concerned only with modern high speed devices, but very much not for device assignment used for compatibility use cases or commodity hardware users. What's a good 4.0.1 strategy to resolve this? Re-instate KVM irqchip as the Q35 default? I can't see that simply switching to current QEMU handling is a viable option for performance? What about 4.1? We could certainly improve EOI support in QEMU, there's essentially no support currently, but it seems like an uphill battle for an iothread based userspace ioapic to ever compare to KVM handling? Thanks, Alex