After such a long time silence, finally we have a really workable version. It has quite a few difference from previous, so does kernel driver parts. I will try to describe the big changes(and reason) as following. Also still have some unsure points need comments, and left some debug lines in the patch, so it is also a RFC version.
FYI: 1. v9 triggered a bug of igb driver which causes guest kernel oops, actually it doesn't have any relationship with aer patchset, we just trigger it. Now oops issue has been fixed: http://patchwork.ozlabs.org/patch/692171/ v10 changelog: 1. Many comments & commit message rephrase, and minor changes, not list them here. 2. Rebase on Eric Auger's "Convert VFIO-PCI to realize" patchset. 3. Introduce a new patch 3 to support configurable AER cap version. 4. Drop vfio_pci_host_match_slot() in patch 6, because host_match implies slot match 5. Change what eventfd signals(check vfio driver patch). Before, vfio-pci driver add 1 to the counter, which doesn't have meaning, just for notification. Now, vfio-pci's error detected handler read the uncorrectable error status and signal it to qemu. Why? When testing previous version(host aer driver still reset link), found that there is quite possibility that register reading returns the invalid value 0xFFFFFFFF, which results in all(2 in my environment) qemu's vfio-pci function send AER to root port while I only inject error into one function. This is unreasonable, and I think it is the result of reading during device reset. Previous patch does considered to find the real function who has error happened, but won't work under the situation I found. So move the register reading as early as possible would be the nature thoughts, and it is moved into vfio-pci driver. Although now reset_link is disabled in aer driver, get the uncor error status as early as possible still make sense, for example: if user space driver does device reset once receiving the error notification, and then read register. 6. Drop the ugly patch which does "wait 3s to do guest reset". There are several consideration in it. 1st, previous patch use a flag in function 0 to indicate that the functions under the virtual bus got error notification, then do bus reset rashly, as the response to guest reset_link. This is unreasonable. In physical world, set bridge's secondary bus reset would send hot-reset TLP to all functions below, trigger every device's reset separately. Emulated device should behave the same, means using DeviceClass->reset without any logical change. 2nd, in vfio-pci driver, we use pci_cfg_access_trylock() to prevent config access. When signal aer to guest, guest's aer driver will do config space access[*], which results in qemu process being scheduled out. Unlock the config space in vfio-pci resume handler, then there is no need to wait anything. [*] find_source_device->find_device_iter->is_error_source 7. For description integrality, I put the changelog of kernel part here. The *key issue* of whole functionality is on the reset_link in host error recovery. On one hand, device-specific driver in guest often will do register read/write during its error recovery, but reset_link in host will make the result of read/write unpredictable, the guest kernel oops issue is one of the results. On the other hand, link reset is one big step in whole error recovery. error recovery is very device specific thing, so, vfio-pci as the meta driver, won't know how to recover naturally. So it is user space driver, or device specific driver in guest, who should take the whole responsibility to do the error recovery, include link reset. Unsure points on vfio-pci driver: 1. Previous patch disabled irq notification in its error detected handler, but without enabling it again in resume handler, which is wrong. I am not sure whether we need to disable it, so I just removed the code. 2. Two new fields: aer_recovering & aer_error_completion of struct vfio_pci_device, and the related operation on them, I am not sure we still need them, for now, I think maybe we can drop them. Test: I test it with intel 82576 NIC, which has 2 functions, function 1 has cable connected to gateway, function 0 has no link. 1. Linux: guest works well with this patchset. Robustness is good. Steps: * start the VM with 2 assigned functions * assign ip to function 1, add route info, and ping the gateway * on host, injecting error to the assigned devices in any combination: inject fatal/nonfatal error to func 0/func 1. Or, repeat injecting certain error to certain function quickly. The result meets the expectation 2. Windows: only test on Win7, but recovery don't work, when injecting error, Win7 will show the blue screen, and then reboot. I think it is the bug of Windows driver, and not this patch's scope to fix it. On previous version, Win7 shows blue screen, then reboot, then blue again, reboot again, repeatedly, won't stop. Chen Fan (9): vfio: extract vfio_get_hot_reset_info as a single function vfio: squeeze out vfio_pci_do_hot_reset to support bus reset vfio: new function to init aer cap for vfio device vfio: refine function vfio_pci_host_match vfio: add check host bus reset is support or not pci: introduce function validation check during hotplug vfio: check aer functionality for hotplugged device vfio-pci: pass the aer error to guest vfio: add 'aer' property to expose aercap Dou Liyang (1): pcie_aer: support configurable AER capa version hw/net/e1000e.c | 3 +- hw/pci-bridge/ioh3420.c | 2 +- hw/pci-bridge/xio3130_downstream.c | 2 +- hw/pci-bridge/xio3130_upstream.c | 2 +- hw/pci/pci.c | 29 ++ hw/pci/pcie_aer.c | 5 +- hw/vfio/pci.c | 543 ++++++++++++++++++++++++++++++++----- hw/vfio/pci.h | 4 + include/hw/pci/pci.h | 1 + include/hw/pci/pcie_aer.h | 3 +- 10 files changed, 516 insertions(+), 78 deletions(-) -- 1.8.3.1