> -----Original Message-----
> From: Ankur Arora <ankur.a.ar...@oracle.com>
> Sent: Monday, March 8, 2021 12:29 PM
> To: Bharat Bhushan <bbhush...@marvell.com>
> Cc: alex.william...@redhat.com; linux-kernel@vger.kernel.org; Sunil Kovvuri
> Goutham <sgout...@marvell.com>; termi...@gmail.com
> Subject: Re: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous
> calls
> 
> On 2021-03-02 4:47 a.m., Bharat Bhushan wrote:
> > Hi Ankur,
> >
> >> -----Original Message-----
> >> From: Ankur Arora <ankur.a.ar...@oracle.com>
> >> Sent: Friday, February 26, 2021 6:24 AM
> >> To: Bharat Bhushan <bbhush...@marvell.com>
> >> Cc: alex.william...@redhat.com; ankur.a.ar...@oracle.com; linux-
> >> ker...@vger.kernel.org; Sunil Kovvuri Goutham <sgout...@marvell.com>;
> >> termi...@gmail.com
> >> Subject: [EXT] Re: vfio-pci: protect remap_pfn_range() from
> >> simultaneous calls
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> -
> >> Hi Bharat,
> >>
> >> Can you test the patch below to see if it works for you?
> >
> > Sorry for late reply, I actually missed this email.
> >
> > Reproducibility of the issue was low in my test scenario, one out of ~15 
> > runs. I
> run it multiple times, overnight and observed no issues.
> 
> Awesome. Would you mind giving me your Tested-by for the patch?

Sure, please point if this is already sent for review.

> 
> >> Also could you add some more detail to your earlier description of
> >> the bug?
> >
> > Our test case is running ODP multi-threaded application, where parent 
> > process
> maps (yes it uses MAP_DMA) the region and then child processes access same.
> As a workaround we tried accessing the region once by parent process before
> creating other accessor threads and it worked as expected.
> 
> Thanks for the detail. So if the child processes start early -- they might 
> fault while
> the VFIO_IOMMU_MAP_DMA was going on. And, since they only acquire
> mmap_lock in RO mode, both paths would end up calling io_remap_pfn_range()
> via the fault handler.

Yes, that's correct.

Thanks
-Bharat

> 
> Thanks
> Ankur
> 
> >
> > Thanks
> > -Bharat
> >
> >> In particular, AFAICS you are using ODP (-DPDK?) with multiple
> >> threads touching this region. From your stack, it looks like the
> >> fault was user-space generated, and I'm guessing you were not using
> >> the VFIO_IOMMU_MAP_DMA.
> >>
> >> Ankur
> >>
> >> -- >8 --
> >>
> >> Subject: [PATCH] vfio-pci: protect io_remap_pfn_range() from
> >> simultaneous calls
> >>
> >> vfio_pci_mmap_fault() maps the complete VMA on fault. With concurrent
> >> faults, this would result in multiple calls to io_remap_pfn_range(),
> >> where it would hit a BUG_ON(!pte_none(*pte)) in remap_pte_range().
> >> (It would also link the same VMA multiple times in vdev->vma_list but
> >> given the BUG_ON that is less serious.)
> >>
> >> Normally, however, this won't happen -- at least with
> >> vfio_iommu_type1 -- the VFIO_IOMMU_MAP_DMA path is protected by
> iommu->lock.
> >>
> >> If, however, we are using some kind of parallelization mechanism like
> >> this one with ktask under discussion [1], we would hit this.
> >> Even if we were doing this serially, given that vfio-pci remaps a
> >> larger extent than strictly necessary it should internally enforce
> >> coherence of its data structures.
> >>
> >> Handle this by using the VMA's presence in the vdev->vma_list as
> >> indicative of a fully mapped VMA and returning success early to all
> >> but the first VMA fault. Note that this is clearly optimstic given
> >> that the mapping is ongoing, and might mean that the caller sees more
> >> faults until the remap is done.
> >>
> >> [1]
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_
> >> linux-
> >> 2Dmm_20181105145141.6f9937f6-
> >>
> 40w520.home_&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PAAlWswPe7d8gHl
> >>
> GbCLmy2YezyK7O3Hv_t2heGnouBw&m=3ZDXqnn9xNUCjgXwN9mHIKT7oyXu55P
> >> U7yV2j0b-5hw&s=hiICkNtrcH4AbAWRrbkvMUylp7Bv0YHFCjxNGC6CGOk&e=
> >>
> >> Signed-off-by: Ankur Arora <ankur.a.ar...@oracle.com>
> >> ---
> >>   drivers/vfio/pci/vfio_pci.c | 25 ++++++++++++++++++++++++-
> >>   1 file changed, 24 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c
> >> b/drivers/vfio/pci/vfio_pci.c index 65e7e6b44578..b9f509863db1 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -1573,6 +1573,11 @@ static int __vfio_pci_add_vma(struct
> >> vfio_pci_device *vdev,
> >>   {
> >>    struct vfio_pci_mmap_vma *mmap_vma;
> >>
> >> +  list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
> >> +          if (mmap_vma->vma == vma)
> >> +                  return 1;
> >> +  }
> >> +
> >>    mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
> >>    if (!mmap_vma)
> >>            return -ENOMEM;
> >> @@ -1613,6 +1618,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault
> >> *vmf)
> >>    struct vm_area_struct *vma = vmf->vma;
> >>    struct vfio_pci_device *vdev = vma->vm_private_data;
> >>    vm_fault_t ret = VM_FAULT_NOPAGE;
> >> +  int vma_present;
> >>
> >>    mutex_lock(&vdev->vma_lock);
> >>    down_read(&vdev->memory_lock);
> >> @@ -1623,7 +1629,21 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault *vmf)
> >>            goto up_out;
> >>    }
> >>
> >> -  if (__vfio_pci_add_vma(vdev, vma)) {
> >> +  /*
> >> +   * __vfio_pci_add_vma() either adds the vma to the vdev->vma_list
> >> +   * (vma_present == 0), or indicates that the vma is already present
> >> +   * on the list (vma_present == 1).
> >> +   *
> >> +   * Overload the meaning of this flag to also imply that the vma is
> >> +   * fully mapped. This allows us to serialize the mapping -- ensuring
> >> +   * that simultaneous faults will not both try to call
> >> +   * io_remap_pfn_range().
> >> +   *
> >> +   * However, this might mean that callers to which we returned success
> >> +   * optimistically will see more faults until the remap is complete.
> >> +   */
> >> +  vma_present = __vfio_pci_add_vma(vdev, vma);
> >> +  if (vma_present < 0) {
> >>            ret = VM_FAULT_OOM;
> >>            mutex_unlock(&vdev->vma_lock);
> >>            goto up_out;
> >> @@ -1631,6 +1651,9 @@ static vm_fault_t vfio_pci_mmap_fault(struct
> >> vm_fault
> >> *vmf)
> >>
> >>    mutex_unlock(&vdev->vma_lock);
> >>
> >> +  if (vma_present)
> >> +          goto up_out;
> >> +
> >>    if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> >>                           vma->vm_end - vma->vm_start, vma-
> >>> vm_page_prot))
> >>            ret = VM_FAULT_SIGBUS;
> >> --
> >> 2.29.2
> >

Reply via email to