On 25.08.2024 20:03, Stewart Hildebrand wrote: > On 8/13/24 10:01, Jan Beulich wrote: >> One aspect I didn't properly consider when making the suggestion: What if, >> without all VFs having gone away, the PF is re-added? In that case we >> would better recycle the existing structure. That's getting a little >> complicated, so maybe a better approach is to refuse the request (in >> pci_remove_device()) when the list isn't empty? > > I set up a test case locally to remove a PF without removing the > associated VFs by hacking an SR-IOV PF driver. Although the PF driver > *should* remove the VFs first, it's completely up to the particular PF > driver how VFs/PFs are removed during hot-un-plug, in what order, or > whether at all to remove the VFs before removing the PF. Anyway, during > PF-only removal, at least the Linux PCI subsystem warns about it: > > [ 106.500334] igb 0000:01:00.0: driver left SR-IOV enabled after remove > > Returning an error code in pci_remove_device() results in only a warning > from Linux: > > [ 106.507011] pci 0000:01:00.0: Failed to delete - passthrough or MSI/MSI-X > might fail! > > Despite the warning, Linux still proceeds to remove the PF, and we would > retain a stale PF in Xen. Re-adding (hotplugging) the just-removed PF > led to Xen crashing in another weird way. > > To handle this more gracefully, I suggest removing the VFs right away > along with the PF in pci_remove_device() when a PF removal request comes > along. This would satisfy the test case described here without Xen > crashing.
Hmm. That's an option, yet would introduce an asymmetry: The PF can be added late (after VFs), so it would only seem consistent to allow it to be removed early (keeping the VFs). Suitably justified / commented it may nevertheless be the route to take, for (hopefully) reducing possible complications. Jan