On Fri, Feb 28, 2020 at 11:36:55AM +0100, Paolo Bonzini wrote: > On 26/02/20 23:50, Peter Xu wrote: > > VFIO INTx is not working with split irqchip. On new kernels KVM_IRQFD > > will directly fail with resamplefd attached so QEMU will automatically > > fallback to the INTx slow path. However on old kernels it's still > > broken. > > > > Only until recently I noticed that this could also break PXE boot for > > assigned NICs [1]. My wild guess is that the PXE ROM will be mostly > > using INTx as well, which means we can't bypass that even if we > > enables MSI for the guest kernel. > > > > This series tries to first fix this issue function-wise, then speed up > > for the INTx again with resamplefd (mostly following the ideas > > proposed by Paolo one year ago [2]). My TCP_RR test shows that: > > > > - Before this series: this is broken, no number to show > > > > - After patch 1 (enable slow path): get 63% perf comparing to full > > kernel irqchip > > Oh, I thought something like patch 1 had already been applied. > > One comment: because you're bypassing IOAPIC when raising the irq, the > IOAPIC's remote_irr for example will not be set. Most OSes probably > don't care, but it's at least worth a comment.
Ouch I should definitely do that... How about something like this (in ioapic_eoi_broadcast(), I even changed kvm_resample_fd_notify to return a boolean to show whether some GSI is kicked so for this case we don't need to proceed on checking irr and remote irr): /* * When IOAPIC is in the userspace while APIC is still in * the kernel (i.e., split irqchip), we have a trick to * kick the resamplefd logic for registered irqfds from * userspace to deactivate the IRQ. When that happens, it * means the irq bypassed userspace IOAPIC (so the irr and * remote-irr of the table entry should be bypassed too * even if interrupt come), then we don't need to clear * the remote-IRR and check irr again because they'll * always be zeros. */ if (kvm_resample_fd_notify(n)) { continue; } I confess this is still tricky, and actually after some careful read I noticed you've proposed a similar kernel fix for the problem too which I overlooked (https://patchwork.kernel.org/patch/10738541/#22609933). My current thought is that we keep this hackery in userspace only so we keep split+resamplefd forbidden in the kernel and be clean there. What's your opinion? (I should have marked this series as RFC when post) Thanks, -- Peter Xu