On 19.11.2021 14:16, Oleksandr Andrushchenko wrote: > On 19.11.21 15:00, Jan Beulich wrote: >> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote: >>> Possible locking and other work needed: >>> ======================================= >>> >>> 1. pcidevs_{lock|unlock} is too heavy and is per-host >>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device >>> 3. We may want a dedicated per-domain rw lock to be implemented: >>> >>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >>> index 28146ee404e6..ebf071893b21 100644 >>> --- a/xen/include/xen/sched.h >>> +++ b/xen/include/xen/sched.h >>> @@ -444,6 +444,7 @@ struct domain >>> >>> #ifdef CONFIG_HAS_PCI >>> struct list_head pdev_list; >>> + rwlock_t vpci_rwlock; >>> + bool vpci_terminating; <- atomic? >>> #endif >>> then vpci_remove_device is a writer (cold path) and vpci_process_pending and >>> vpci_mmio_{read|write} are readers (hot path). >> Right - you need such a lock for other purposes anyway, as per the >> discussion with Julien. > What about bool vpci_terminating? Do you see it as an atomic type or just > bool?
Having seen only ... >>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need >>> hypercall_create_continuation >>> to be implemented, so when re-start removal if need be: >>> >>> vpci_remove_device() >>> { >>> d->vpci_terminating = true; ... this use so far, I can't tell yet. But at a first glance a boolean looks to be what you need. >>> remove vPCI register handlers <- this will cut off PCI_COMMAND >>> emulation among others >>> if ( !write_trylock(d->vpci_rwlock) ) >>> return -ERESTART; >>> xfree(pdev->vpci); >>> pdev->vpci = NULL; >>> } >>> >>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for >>> other operations which may require it, e.g. virtual bus topology can >>> use it when assigning vSBDF etc. >>> >>> 4. vpci_remove_device needs to be removed from vpci_process_pending >>> and do nothing for Dom0 and crash DomU otherwise: >> Why is this? I'm not outright opposed, but I don't immediately see why >> trying to remove the problematic device wouldn't be a reasonable course >> of action anymore. vpci_remove_device() may need to become more careful >> as to not crashing, > vpci_remove_device does not crash, vpci_process_pending does >> though. > Assume we are in an error state in vpci_process_pending *on one of the vCPUs* > and we call vpci_remove_device. vpci_remove_device tries to acquire the > lock and it can't just because there are some other vpci code is running on > other vCPU. > Then what do we do here? We are in SoftIRQ context now and we can't spin > trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci > structure because it is seen by all vCPUs and may crash them. > > If vpci_remove_device is in hypercall context it just returns -ERESTART and > hypercall continuation helps here. But not in SoftIRQ context. Maybe then you want to invoke this cleanup from RCU context (whether vpci_remove_device() itself or a suitable clone there of is TBD)? (I will admit though that I didn't check whether that would satisfy all constraints.) Then again it also hasn't become clear to me why you use write_trylock() there. The lock contention you describe doesn't, on the surface, look any different from situations elsewhere. Jan