On 22.11.21 16:57, Jan Beulich wrote:
> On 22.11.2021 15:45, Oleksandr Andrushchenko wrote:
>>
>> On 22.11.21 16:37, Jan Beulich wrote:
>>> On 22.11.2021 15:21, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 15:34, Oleksandr Andrushchenko wrote:
>>>>> On 19.11.21 15:25, Jan Beulich wrote:
>>>>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
>>>>>>> On 19.11.21 15:00, Jan Beulich wrote:
>>>>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>>>>>>>> Possible locking and other work needed:
>>>>>>>>> =======================================
>>>>>>>>>
>>>>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>>>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by
>>>>>>>>> vpci_remove_device
>>>>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>>>>>>>
>>>>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>>>>>>>> index 28146ee404e6..ebf071893b21 100644
>>>>>>>>> --- a/xen/include/xen/sched.h
>>>>>>>>> +++ b/xen/include/xen/sched.h
>>>>>>>>> @@ -444,6 +444,7 @@ struct domain
>>>>>>>>>
>>>>>>>>> #ifdef CONFIG_HAS_PCI
>>>>>>>>> struct list_head pdev_list;
>>>>>>>>> + rwlock_t vpci_rwlock;
>>>>>>>>> + bool vpci_terminating; <- atomic?
>>>>>>>>> #endif
>>>>>>>>> then vpci_remove_device is a writer (cold path) and
>>>>>>>>> vpci_process_pending and
>>>>>>>>> vpci_mmio_{read|write} are readers (hot path).
>>>>>>>> Right - you need such a lock for other purposes anyway, as per the
>>>>>>>> discussion with Julien.
>>>>>>> What about bool vpci_terminating? Do you see it as an atomic type or
>>>>>>> just bool?
>>>>>> Having seen only ...
>>>>>>
>>>>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need
>>>>>>>>> hypercall_create_continuation
>>>>>>>>> to be implemented, so when re-start removal if need be:
>>>>>>>>>
>>>>>>>>> vpci_remove_device()
>>>>>>>>> {
>>>>>>>>> d->vpci_terminating = true;
>>>>>> ... this use so far, I can't tell yet. But at a first glance a boolean
>>>>>> looks to be what you need.
>>>>>>
>>>>>>>>> remove vPCI register handlers <- this will cut off
>>>>>>>>> PCI_COMMAND emulation among others
>>>>>>>>> if ( !write_trylock(d->vpci_rwlock) )
>>>>>>>>> return -ERESTART;
>>>>>>>>> xfree(pdev->vpci);
>>>>>>>>> pdev->vpci = NULL;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>>>>>>>> other operations which may require it, e.g. virtual bus topology can
>>>>>>>>> use it when assigning vSBDF etc.
>>>>>>>>>
>>>>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>>>>>>>> and do nothing for Dom0 and crash DomU otherwise:
>>>>>>>> Why is this? I'm not outright opposed, but I don't immediately see why
>>>>>>>> trying to remove the problematic device wouldn't be a reasonable course
>>>>>>>> of action anymore. vpci_remove_device() may need to become more careful
>>>>>>>> as to not crashing,
>>>>>>> vpci_remove_device does not crash, vpci_process_pending does
>>>>>>>> though.
>>>>>>> Assume we are in an error state in vpci_process_pending *on one of the
>>>>>>> vCPUs*
>>>>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the
>>>>>>> lock and it can't just because there are some other vpci code is
>>>>>>> running on other vCPU.
>>>>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin
>>>>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free
>>>>>>> vpci
>>>>>>> structure because it is seen by all vCPUs and may crash them.
>>>>>>>
>>>>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART
>>>>>>> and
>>>>>>> hypercall continuation helps here. But not in SoftIRQ context.
>>>>>> Maybe then you want to invoke this cleanup from RCU context (whether
>>>>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I
>>>>>> will admit though that I didn't check whether that would satisfy all
>>>>>> constraints.)
>>>>>>
>>>>>> Then again it also hasn't become clear to me why you use write_trylock()
>>>>>> there. The lock contention you describe doesn't, on the surface, look
>>>>>> any different from situations elsewhere.
>>>>> I use write_trylock in vpci_remove_device because if we can't
>>>>> acquire the lock then we defer device removal. This would work
>>>>> well if called from a hypercall which will employ hypercall continuation.
>>>>> But SoftIRQ getting -ERESTART is something that we can't probably
>>>>> handle by restarting as hypercall can, thus I only see that
>>>>> vpci_process_pending
>>>>> will need to spin and wait until vpci_remove_device succeeds.
>>>> Does anybody have any better solution for preventing SoftIRQ from
>>>> spinning on vpci_remove_device and -ERESTART?
>>> Well, at this point I can suggest only a marginal improvement: Instead of
>>> spinning inside the softirq handler, you want to re-raise the softirq and
>>> exit the handler. That way at least higher "priority" softirqs won't be
>>> starved.
>>>
>>> Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such
>>> an event, with the work deferred to a tasklet?
>>>
>>> Yet I don't think my earlier question regarding the use of write_trylock()
>>> was really answered. What you said in reply doesn't explain (to me at
>>> least) why write_lock() is not an option.
>> I was thinking that we do not want to freeze in case we are calling
>> vpci_remove_device
>> from SoftIRQ context, thus we try to lock and if we can't we return -ERESTART
>> indicating that the removal needs to be deferred. If we use write_lock, then
>> SoftIRQ -> write_lock will spin there waiting for readers to release the
>> lock.
>>
>> write_lock actually makes things a lot easier, but I just don't know if it
>> is ok to use it. If so, then vpci_remove_device becomes synchronous and
>> there is no need in hypercall continuation and other heavy machinery for
>> re-scheduling SoftIRQ...
> I'm inclined to ask: If it wasn't okay to use here, then where would it be
> okay to use? Of course I realize there are cases when long spinning times
> can be a problem.
I can't prove, but I have a feeling that write_lock could be less
"harmful" in case of a hypercall rather than SoftIRQ
> But I expect there aren't going to be excessively long
> lock holding regions for this lock, and I also would expect average
> contention to not be overly bad.
Yes, this is my impression as well
> But in the end you know better the code
> that you're writing (and which may lead to issues with the lock usage) than
> I do ...
I am pretty much ok with write_lock as it does make things way easier.
So I'll got with write_lock then and if we spot great contention then we
can improve that
>
> Jan
>
Thank you!!
Oleksandr