On 18.11.21 16:04, Roger Pau Monné wrote:
> Sorry, I've been quite busy with other staff.
>
> On Thu, Nov 18, 2021 at 01:48:06PM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 18.11.21 15:25, Jan Beulich wrote:
>>> On 18.11.2021 10:32, Oleksandr Andrushchenko wrote:
>>>> On 18.11.21 11:15, Jan Beulich wrote:
>>>>> On 18.11.2021 09:54, Oleksandr Andrushchenko wrote:
>>>>>> On 18.11.21 10:36, Jan Beulich wrote:
>>>>>>> On 18.11.2021 08:49, Oleksandr Andrushchenko wrote:
>>>>>>>> On 17.11.21 10:28, Jan Beulich wrote:
>>>>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com>
>>>>>>>>>>
>>>>>>>>>> When a vPCI is removed for a PCI device it is possible that we have
>>>>>>>>>> scheduled a delayed work for map/unmap operations for that device.
>>>>>>>>>> For example, the following scenario can illustrate the problem:
>>>>>>>>>>
>>>>>>>>>> pci_physdev_op
>>>>>>>>>>         pci_add_device
>>>>>>>>>>             init_bars -> modify_bars -> defer_map -> 
>>>>>>>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>>>>>>>         iommu_add_device <- FAILS
>>>>>>>>>>         vpci_remove_device -> xfree(pdev->vpci)
>>>>>>>>>>
>>>>>>>>>> leave_hypervisor_to_guest
>>>>>>>>>>         vpci_process_pending: v->vpci.mem != NULL; 
>>>>>>>>>> v->vpci.pdev->vpci == NULL
>>>>>>>>>>
>>>>>>>>>> For the hardware domain we continue execution as the worse that
>>>>>>>>>> could happen is that MMIO mappings are left in place when the
>>>>>>>>>> device has been deassigned
>>>>>>>>>>
>>>>>>>>>> For unprivileged domains that get a failure in the middle of a vPCI
>>>>>>>>>> {un}map operation we need to destroy them, as we don't know in which
>>>>>>>>>> state the p2m is. This can only happen in vpci_process_pending for
>>>>>>>>>> DomUs as they won't be allowed to call pci_add_device.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Oleksandr Andrushchenko 
>>>>>>>>>> <oleksandr_andrushche...@epam.com>
>>>>>>>>> Thinking about it some more, I'm not convinced any of this is really
>>>>>>>>> needed in the presented form.
>>>>>>>> The intention of this patch was to handle error conditions which are
>>>>>>>> abnormal, e.g. when iommu_add_device fails and we are in the middle
>>>>>>>> of initialization. So, I am trying to cancel all pending work which 
>>>>>>>> might
>>>>>>>> already be there and not to crash.
>>>>>>> Only Dom0 may be able to prematurely access the device during "add".
>>>>>>> Yet unlike for DomU-s we generally expect Dom0 to be well-behaved.
>>>>>>> Hence I'm not sure I see the need for dealing with these.
>>>>>> Probably I don't follow you here. The issue I am facing is Dom0
>>>>>> related, e.g. Xen was not able to initialize during "add" and thus
>>>>>> wanted to clean up the leftovers. As the result the already
>>>>>> scheduled work crashes as it was not neither canceled nor interrupted
>>>>>> in some safe manner. So, this sounds like something we need to take
>>>>>> care of, thus this patch.
>>>>> But my point was the question of why there would be any pending work
>>>>> in the first place in this case, when we expect Dom0 to be well-behaved.
>>>> I am not saying Dom0 misbehaves here. This is my real use-case
>>>> (as in the commit message):
>>>>
>>>> pci_physdev_op
>>>>         pci_add_device
>>>>             init_bars -> modify_bars -> defer_map -> 
>>>> raise_softirq(SCHEDULE_SOFTIRQ)
>>>>         iommu_add_device <- FAILS
>>>>         vpci_remove_device -> xfree(pdev->vpci)
>>>>
>>>> leave_hypervisor_to_guest
>>>>         vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == 
>>>> NULL
>>> First of all I'm sorry for having lost track of that particular case in
>>> the course of the discussion.
>> No problem, I see on the ML how much you review every day...
>>> I wonder though whether that's something we really need to take care of.
>>> At boot (on x86) modify_bars() wouldn't call defer_map() anyway, but
>>> use apply_map() instead. I wonder whether this wouldn't be appropriate
>>> generally in the context of init_bars() when used for Dom0 (not sure
>>> whether init_bars() would find some form of use for DomU-s as well).
>>> This is even more so as it would better be the exception that devices
>>> discovered post-boot start out with memory decoding enabled (which is a
>>> prereq for modify_bars() to be called in the first place).
>> Well, first of all init_bars is going to be used for DomUs as well:
>> that was discussed previously and is reflected in this series.
>>
>> But the real use-case for the deferred mapping would be the one
>> from PCI_COMMAND register write emulation:
>>
>> void vpci_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>                       uint32_t cmd, void *data)
>> {
>> [snip]
>>           modify_bars(pdev, cmd, false);
>>
>> which in turn calls defer_map.
>>
>> I believe Roger did that for a good reason not to stall Xen while doing
>> map/unmap (which may take quite some time) and moved that to
>> vpci_process_pending and SOFTIRQ context.
> Indeed. In the physdevop failure case this comes from an hypercall
> context, so maybe you could do the mapping in place using hypercall
> continuations if required. Not sure how complex that would be,
> compared to just deferring to guest entry point and then dealing with
> the possible cleanup on failure.
This will solve one part of the equation:

pci_physdev_op
        pci_add_device
            init_bars -> modify_bars -> defer_map -> 
raise_softirq(SCHEDULE_SOFTIRQ)
        iommu_add_device <- FAILS
        vpci_remove_device -> xfree(pdev->vpci)

But what about the other one, e.g. vpci_process_pending is triggered in
parallel with PCI device de-assign for example?

>
> Thanks, Roger.
Thank you,
Oleksandr

Reply via email to