Good to know that we have a solution. :) I think this is a good fix, however I still do not understand why this is happening... Please see below comment.
On Thu, Sep 15, 2016 at 04:11:48PM +1000, David Gibson wrote: > d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup > of kvmchip routing configuration, that was mostly intended for x86. > However, it also contains a subtle change in behaviour which breaks EEH[1] > error recovery on certain VFIO passthrough devices on spapr guests. So far > it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be > other hardware with the same problem. It's also possible there could be > circumstances where it causes a bug on x86 as well, though I don't know of > any obvious candidates. > > Prior to d1f6af6, both vfio_msix_vector_do_use() and > vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this > as the "dummy" vector used to make the host hardware state sync with the > guest expected hardware state in terms of MSI configuration. > > Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op, > meaning the dummy irq would always be delivered via qemu. AFAICT, QEMU is not delivering that *dummy* IRQ as well. IIUC here the dummy IRQ refers to the one in vfio_msix_enable(): vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL); vfio_msix_vector_release(&vdev->pdev, 0); Here we registered this dummy one to sync up the guest and host hardware state for some reason. However, the handler is NULL here, means that even QEMU won't handle it if it happens. And the only difference is that during this extremely small period, kvm will handle the interrupt with an uninitialized MSI message, but assuming that would never happen since no one should start to use MSI yet. After release(), interrupts will be dropped for all cases. So looks like it is not related to "whether QEMU or KVM will handle the interrupt", but something else. Generally speaking, the process of registering the IRQ should be something like (using QEMU with d1f6af6 change): 1. vfio_msix_enable(): when MSIX is enabled. this will trigger vfio_add_kvm_msi_virq(), but quickly it is unregistered (virq will be there, but VFIO will assign the VFIO handler to vector->interrupt, rather than vector->kvm_interrupt, so that virq won't take effect). 2. vfio_msix_vector_use(): when an MSIX entry is finally used and setup. this will trigger vfio_update_kvm_msi_virq(), to update the interrupt information to the "real one". IMHO we should always receive interrupts after step (2). And as we have traced (with David), step (2) was updating the correct interrupt information even with commit d1f6af6. But I think I might be wrong (or say, the above assumption is not correct), because commit d1f6af6 indeed caused problem with EHH and this specific hardware. I am just do not know why it is triggering the issue. And that's why I want to know whether there is anything special with that NIC's driver. Again, this is totally a discussion only, and I am totally agree with current change to fix the issue. Just in case someone knows the reason behind this. Thanks, -- peterx