On 25.08.2024 20:03, Stewart Hildebrand wrote:
> On 8/13/24 10:01, Jan Beulich wrote:
>> One aspect I didn't properly consider when making the suggestion: What if,
>> without all VFs having gone away, the PF is re-added? In that case we
>> would better recycle the existing structure. That's getting a little
>> complicated, so maybe a better approach is to refuse the request (in
>> pci_remove_device()) when the list isn't empty?
> 
> I set up a test case locally to remove a PF without removing the
> associated VFs by hacking an SR-IOV PF driver. Although the PF driver
> *should* remove the VFs first, it's completely up to the particular PF
> driver how VFs/PFs are removed during hot-un-plug, in what order, or
> whether at all to remove the VFs before removing the PF. Anyway, during
> PF-only removal, at least the Linux PCI subsystem warns about it:
> 
> [  106.500334] igb 0000:01:00.0: driver left SR-IOV enabled after remove
> 
> Returning an error code in pci_remove_device() results in only a warning
> from Linux:
> 
> [  106.507011] pci 0000:01:00.0: Failed to delete - passthrough or MSI/MSI-X 
> might fail!
> 
> Despite the warning, Linux still proceeds to remove the PF, and we would
> retain a stale PF in Xen. Re-adding (hotplugging) the just-removed PF
> led to Xen crashing in another weird way.
> 
> To handle this more gracefully, I suggest removing the VFs right away
> along with the PF in pci_remove_device() when a PF removal request comes
> along. This would satisfy the test case described here without Xen
> crashing.

Hmm. That's an option, yet would introduce an asymmetry: The PF can be
added late (after VFs), so it would only seem consistent to allow it to
be removed early (keeping the VFs). Suitably justified / commented it
may nevertheless be the route to take, for (hopefully) reducing possible
complications.

Jan

Reply via email to