On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
> On 19.11.21 15:00, Jan Beulich wrote:
>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>> Possible locking and other work needed:
>>> =======================================
>>>
>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device
>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>> index 28146ee404e6..ebf071893b21 100644
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -444,6 +444,7 @@ struct domain
>>>
>>>    #ifdef CONFIG_HAS_PCI
>>>        struct list_head pdev_list;
>>> +    rwlock_t vpci_rwlock;
>>> +    bool vpci_terminating; <- atomic?
>>>    #endif
>>> then vpci_remove_device is a writer (cold path) and vpci_process_pending and
>>> vpci_mmio_{read|write} are readers (hot path).
>> Right - you need such a lock for other purposes anyway, as per the
>> discussion with Julien.
> What about bool vpci_terminating? Do you see it as an atomic type or just 
> bool?

Having seen only ...

>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need 
>>> hypercall_create_continuation
>>> to be implemented, so when re-start removal if need be:
>>>
>>> vpci_remove_device()
>>> {
>>>     d->vpci_terminating = true;

... this use so far, I can't tell yet. But at a first glance a boolean
looks to be what you need.

>>>     remove vPCI register handlers <- this will cut off PCI_COMMAND 
>>> emulation among others
>>>     if ( !write_trylock(d->vpci_rwlock) )
>>>       return -ERESTART;
>>>     xfree(pdev->vpci);
>>>     pdev->vpci = NULL;
>>> }
>>>
>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>> other operations which may require it, e.g. virtual bus topology can
>>> use it when assigning vSBDF etc.
>>>
>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>> and do nothing for Dom0 and crash DomU otherwise:
>> Why is this? I'm not outright opposed, but I don't immediately see why
>> trying to remove the problematic device wouldn't be a reasonable course
>> of action anymore. vpci_remove_device() may need to become more careful
>> as to not crashing,
> vpci_remove_device does not crash, vpci_process_pending does
>>   though.
> Assume we are in an error state in vpci_process_pending *on one of the vCPUs*
> and we call vpci_remove_device. vpci_remove_device tries to acquire the
> lock and it can't just because there are some other vpci code is running on 
> other vCPU.
> Then what do we do here? We are in SoftIRQ context now and we can't spin
> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci
> structure because it is seen by all vCPUs and may crash them.
> 
> If vpci_remove_device is in hypercall context it just returns -ERESTART and
> hypercall continuation helps here. But not in SoftIRQ context.

Maybe then you want to invoke this cleanup from RCU context (whether
vpci_remove_device() itself or a suitable clone there of is TBD)? (I
will admit though that I didn't check whether that would satisfy all
constraints.)

Then again it also hasn't become clear to me why you use write_trylock()
there. The lock contention you describe doesn't, on the surface, look
any different from situations elsewhere.

Jan


Reply via email to