On 11.02.22 13:40, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 07:27:39AM +0000, Oleksandr Andrushchenko wrote: >> Hi, Roger! >> >> On 10.02.22 18:16, Roger Pau Monné wrote: >>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: >>>> From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> >>>> >>>> Introduce a per-domain read/write lock to check whether vpci is present, >>>> so we are sure there are no accesses to the contents of the vpci struct >>>> if not. This lock can be used (and in a few cases is used right away) >>>> so that vpci removal can be performed while holding the lock in write >>>> mode. Previously such removal could race with vpci_read for example. >>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt >>> pci_remove_device, and likely when vPCI gets also used in >>> {de}assign_device I think. >> Yes, this is indeed an issue, but I was not trying to solve it in >> context of vPCI locking yet. I think we should discuss how do >> we approach pdev locking, so I can create a patch for that. >> that being said, I would like not to solve pdev in this patch yet >> >> ...I do understand we do want to avoid that, but at the moment >> a single reliable way for making sure pdev is alive seems to >> be pcidevs_lock.... > I think we will need to make pcidevs_lock a rwlock and take it in read > mode for pci_get_pdev_by_domain. > > We didn't have this scenario before where PCI emulation is done in the > hypervisor, and hence the locking around those data structures has not > been designed for those use-cases. Yes, I do understand that. I hope pcidevs lock move to rwlock can be done as a separate patch. While this is not done, do you think we can proceed with vPCI series and pcidevs locking re-work being done after?
> >>>> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure >>>> from being removed. >>>> >>>> 2. Writing the command register and ROM BAR register may trigger >>>> modify_bars to run, which in turn may access multiple pdevs while >>>> checking for the existing BAR's overlap. The overlapping check, if done >>>> under the read lock, requires vpci->lock to be acquired on both devices >>>> being compared, which may produce a deadlock. It is not possible to >>>> upgrade read lock to write lock in such a case. So, in order to prevent >>>> the deadlock, check which registers are going to be written and acquire >>>> the lock in the appropriate mode from the beginning. >>>> >>>> All other code, which doesn't lead to pdev->vpci destruction and does not >>>> access multiple pdevs at the same time, can still use a combination of the >>>> read lock and pdev->vpci->lock. >>>> >>>> 3. Optimize if ROM BAR write lock required detection by caching offset >>>> of the ROM BAR register in vpci->header->rom_reg which depends on >>>> header's type. >>>> >>>> 4. Reduce locked region in vpci_remove_device as it is now possible >>>> to set pdev->vpci to NULL early right after the write lock is acquired. >>>> >>>> 5. Reduce locked region in vpci_add_handlers as it is possible to >>>> initialize many more fields of the struct vpci before assigning it to >>>> pdev->vpci. >>>> >>>> 6. vpci_{add|remove}_register are required to be called with the write lock >>>> held, but it is not feasible to add an assert there as it requires >>>> struct domain to be passed for that. So, add a comment about this >>>> requirement >>>> to these and other functions with the equivalent constraints. >>>> >>>> 7. Drop const qualifier where the new rwlock is used and this is >>>> appropriate. >>>> >>>> 8. This is based on the discussion at [1]. >>>> >>>> [1] >>>> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2...@gmail.com/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$ >>>> [lore[.]kernel[.]org] >>>> >>>> Suggested-by: Roger Pau Monné <roger....@citrix.com> >>>> Suggested-by: Jan Beulich <jbeul...@suse.com> >>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> >>>> >>>> --- >>>> This was checked on x86: with and without PVH Dom0. >>>> --- >>>> xen/arch/x86/hvm/vmsi.c | 2 + >>>> xen/common/domain.c | 3 + >>>> xen/drivers/vpci/header.c | 8 +++ >>>> xen/drivers/vpci/msi.c | 8 ++- >>>> xen/drivers/vpci/msix.c | 40 +++++++++++-- >>>> xen/drivers/vpci/vpci.c | 114 ++++++++++++++++++++++++++++---------- >>>> xen/include/xen/sched.h | 3 + >>>> xen/include/xen/vpci.h | 2 + >>>> 8 files changed, 146 insertions(+), 34 deletions(-) >>>> >>>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >>>> index 13e2a190b439..351cb968a423 100644 >>>> --- a/xen/arch/x86/hvm/vmsi.c >>>> +++ b/xen/arch/x86/hvm/vmsi.c >>>> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) >>>> { >>>> unsigned int i; >>>> >>>> + ASSERT(!!rw_is_locked(&msix->pdev->domain->vpci_rwlock)); >>> ^ no need for the double negation. >> Ok, will update all asserts which use !! >>> Also this asserts that the lock is taken, but could be by a different >>> pCPU. I guess it's better than nothing. >> Fair enough. Do you still want the asserts or should I remove them? > Likely fine to leave them. Ok > >>>> + >>>> for ( i = 0; i < msix->max_entries; i++ ) >>>> { >>>> const struct vpci_msix_entry *entry = &msix->entries[i]; >>> Since this function is now called with the per-domain rwlock read >>> locked it's likely not appropriate to call process_pending_softirqs >>> while holding such lock (check below). >> You are right, as it is possible that: >> >> process_pending_softirqs -> vpci_process_pending -> read_lock >> >> Even more, vpci_process_pending may also >> >> read_unlock -> vpci_remove_device -> write_lock >> >> in its error path. So, any invocation of process_pending_softirqs >> must not hold d->vpci_rwlock at least. >> >> And also we need to check that pdev->vpci was not removed >> in between or *re-created* >>> We will likely need to re-iterate over the list of pdevs assigned to >>> the domain and assert that the pdev is still assigned to the same >>> domain. >> So, do you mean a pattern like the below should be used at all >> places where we need to call process_pending_softirqs? >> >> read_unlock >> process_pending_softirqs >> read_lock >> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); >> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) >> <continue processing> > Something along those lines. You likely need to continue iterate using > for_each_pdev. I'll try to play around it and see what will it look like > >>>> +{ >>>> + /* >>>> + * Writing the command register and ROM BAR register may trigger >>>> + * modify_bars to run which in turn may access multiple pdevs while >>>> + * checking for the existing BAR's overlap. The overlapping check, if >>>> done >>>> + * under the read lock, requires vpci->lock to be acquired on both >>>> devices >>>> + * being compared, which may produce a deadlock. It is not possible to >>>> + * upgrade read lock to write lock in such a case. So, in order to >>>> prevent >>>> + * the deadlock, check which registers are going to be written and >>>> acquire >>>> + * the lock in the appropriate mode from the beginning. >>>> + */ >>>> + if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) ) >>>> + return true; >>>> + >>>> + if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) ) >>> No need for the comparison if rom_reg is unset. Also you can OR both >>> conditions into a single if. >> If we open code vpci_offset_cmp with a single if all this is going >> to be a bit clumsy: >> >> if ( r1_offset < r2_offset + r2_size && >> r2_offset < r1_offset + r1_size ) >> return 0; >> This is a single check. >> Now we need to check two registers with the code above and >> also check that pdev->vpci->header.rom_reg != 0 >> >> I think it would be more readable if we have a tiny helper function >> >> static bool vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size, >> unsigned int r2_offset, unsigned int r2_size) >> { >> /* Return 0 if registers overlap. */ >> if ( r1_offset < r2_offset + r2_size && >> r2_offset < r1_offset + r1_size ) >> return false; >> return true; >> } Do you mean this helper to be converted into static bool overlaps(unsigned int r1_offset, unsigned int r1_size, unsigned int r2_offset, unsigned int r2_size) >> >> So, then we can have something like >> >> static bool vpci_header_write_lock(const struct pci_dev *pdev, >> unsigned int start, unsigned int size) >> { >> if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) || >> (pdev->vpci->header.rom_reg && >> !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4)) ) >> return true; >> >> return false; >> } > Just create an 'overlaps' static function in header.c. Please see above > > Thanks, Roger. Thank you, Oleksandr