On Wed, Jul 16, 2025 at 05:29:00PM -0500, Bjorn Helgaas wrote: > On Tue, Jul 15, 2025 at 02:28:20AM -0400, Michael S. Tsirkin wrote: > > On Mon, Jul 14, 2025 at 04:13:51PM -0500, Bjorn Helgaas wrote: > > > On Mon, Jul 14, 2025 at 02:26:19AM -0400, Michael S. Tsirkin wrote: > > > > On Wed, Jul 09, 2025 at 06:38:20PM -0500, Bjorn Helgaas wrote: > > > > > On Wed, Jul 09, 2025 at 04:55:26PM -0400, Michael S. Tsirkin wrote: > > > > > > At the moment, in case of a surprise removal, the regular > > > > > > remove callback is invoked, exclusively. This works well, > > > > > > because mostly, the cleanup would be the same. > > > > > > > > > > > > However, there's a race: imagine device removal was > > > > > > initiated by a user action, such as driver unbind, and it in > > > > > > turn initiated some cleanup and is now waiting for an > > > > > > interrupt from the device. If the device is now > > > > > > surprise-removed, that never arrives and the remove callback > > > > > > hangs forever. > > > > > > > > > > > > For example, this was reported for virtio-blk: > > > > > > > > > > > > 1. the graceful removal is ongoing in the remove() callback, > > > > > > where disk > > > > > > deletion del_gendisk() is ongoing, which waits for the > > > > > > requests +to > > > > > > complete, > > > > > > > > > > > > 2. Now few requests are yet to complete, and surprise removal > > > > > > started. > > > > > > > > > > > > At this point, virtio block driver will not get notified by the > > > > > > driver > > > > > > core layer, because it is likely serializing remove() happening > > > > > > by > > > > > > +user/driver unload and PCI hotplug driver-initiated device > > > > > > removal. So > > > > > > vblk driver doesn't know that device is removed, block layer is > > > > > > waiting > > > > > > for requests completions to arrive which it never gets. So > > > > > > del_gendisk() gets stuck. > > > > > > > > > > > > Drivers can artificially add timeouts to handle that, but it can be > > > > > > flaky. > > > > > > > > > > > > Instead, let's add a way for the driver to be notified about the > > > > > > disconnect. It can then do any necessary cleanup, knowing that the > > > > > > device is inactive. > > > > > > > > > > This relies on somebody (typically pciehp, I guess) calling > > > > > pci_dev_set_disconnected() when a surprise remove happens. > > > > > > > > > > Do you think it would be practical for the driver's .remove() method > > > > > to recognize that the device may stop responding at any point, even if > > > > > no hotplug driver is present to call pci_dev_set_disconnected()? > > > > > > > > > > Waiting forever for an interrupt seems kind of vulnerable in general. > > > > > Maybe "artificially adding timeouts" is alluding to *not* waiting > > > > > forever for interrupts? That doesn't seem artificial to me because > > > > > it's just a fact of life that devices can disappear at arbitrary > > > > > times. > > > > > > > > > > It seems a little fragile to me to depend on some other part of the > > > > > system to notice the surprise removal and tell you about it or > > > > > schedule your work function. I think it would be more robust for the > > > > > driver to check directly, i.e., assume writes to the device may be > > > > > lost, check for PCI_POSSIBLE_ERROR() after reads from the device, and > > > > > never wait for an interrupt without a timeout. > > > > > > > > virtio is ... kind of special, in that users already take it for > > > > granted that having a device as long as they want to respond > > > > still does not lead to errors and data loss. > > > > > > > > Makes it a bit harder as our timeout would have to > > > > check for presence and retry, we can't just fail as a > > > > normal hardware device does. > > > > > > Sorry, I don't know enough about virtio to follow what you said about > > > "having a device as long as they want to respond". > > > > > > We started with a graceful remove. That must mean the user no longer > > > needs the device. > > > > I'll try to clarify: > > > > Indeed, the user will not submit new requests, > > but users might also not know that there are some old requests > > being in progress of being processed by the device. > > The driver, currently, waits for that processsing to be complete. > > Cancelling that with a reset on a timeout might be a regression, > > unless the timeout is very long. > > This seems like a corner case and maybe rare enough that simply making > the timeout very long would be a possibility.
Indeed the timeout needs to be very long, and the average would still be reasonable, but the worst case is terrible and the user can't insert a replacement card all this time. The system is perceived as flaky. > > > > And there's the overhead thing - poking at the device a lot > > > > puts a high load on the host. > > > > > > Checking for PCI_POSSIBLE_ERROR() doesn't touch the device. If you > > > did a config read already, and the result happened to be ~0, *then* we > > > have the problem of figuring out whether the actual data from the > > > device was ~0, or if the read failed and the Root Complex synthesized > > > the ~0. In many cases a driver knows that ~0 is not a possible > > > register value. Otherwise it might have to read another register that > > > is known not to be ~0. > > > > To clarify, virtio generally is designed to operate solely > > by means of DMA and interrupt, completely avoiding any PCI > > reads. This, due to PCI reads being very expensive in virtualized > > scenarios. > > > > The extra overhead I refer to is exactly initiating such a read > > where there would not be one in normal operation. > > Thanks, this part is very helpful. And since config accesses are very > expensive in *all* environments, I expect most drivers for > high-performance devices work the same way and only do config accesses > during at probe time. > > If that's true, it will make this more understandable if the commit > log approaches it from that direction and omits virtio specifics. > > Bjorn Will do, thanks a lot! -- MST