> -----Original Message----- > From: Ankur Arora <ankur.a.ar...@oracle.com> > Sent: Monday, March 8, 2021 12:29 PM > To: Bharat Bhushan <bbhush...@marvell.com> > Cc: alex.william...@redhat.com; linux-kernel@vger.kernel.org; Sunil Kovvuri > Goutham <sgout...@marvell.com>; termi...@gmail.com > Subject: Re: [EXT] Re: vfio-pci: protect remap_pfn_range() from simultaneous > calls > > On 2021-03-02 4:47 a.m., Bharat Bhushan wrote: > > Hi Ankur, > > > >> -----Original Message----- > >> From: Ankur Arora <ankur.a.ar...@oracle.com> > >> Sent: Friday, February 26, 2021 6:24 AM > >> To: Bharat Bhushan <bbhush...@marvell.com> > >> Cc: alex.william...@redhat.com; ankur.a.ar...@oracle.com; linux- > >> ker...@vger.kernel.org; Sunil Kovvuri Goutham <sgout...@marvell.com>; > >> termi...@gmail.com > >> Subject: [EXT] Re: vfio-pci: protect remap_pfn_range() from > >> simultaneous calls > >> > >> External Email > >> > >> --------------------------------------------------------------------- > >> - > >> Hi Bharat, > >> > >> Can you test the patch below to see if it works for you? > > > > Sorry for late reply, I actually missed this email. > > > > Reproducibility of the issue was low in my test scenario, one out of ~15 > > runs. I > run it multiple times, overnight and observed no issues. > > Awesome. Would you mind giving me your Tested-by for the patch?
Sure, please point if this is already sent for review. > > >> Also could you add some more detail to your earlier description of > >> the bug? > > > > Our test case is running ODP multi-threaded application, where parent > > process > maps (yes it uses MAP_DMA) the region and then child processes access same. > As a workaround we tried accessing the region once by parent process before > creating other accessor threads and it worked as expected. > > Thanks for the detail. So if the child processes start early -- they might > fault while > the VFIO_IOMMU_MAP_DMA was going on. And, since they only acquire > mmap_lock in RO mode, both paths would end up calling io_remap_pfn_range() > via the fault handler. Yes, that's correct. Thanks -Bharat > > Thanks > Ankur > > > > > Thanks > > -Bharat > > > >> In particular, AFAICS you are using ODP (-DPDK?) with multiple > >> threads touching this region. From your stack, it looks like the > >> fault was user-space generated, and I'm guessing you were not using > >> the VFIO_IOMMU_MAP_DMA. > >> > >> Ankur > >> > >> -- >8 -- > >> > >> Subject: [PATCH] vfio-pci: protect io_remap_pfn_range() from > >> simultaneous calls > >> > >> vfio_pci_mmap_fault() maps the complete VMA on fault. With concurrent > >> faults, this would result in multiple calls to io_remap_pfn_range(), > >> where it would hit a BUG_ON(!pte_none(*pte)) in remap_pte_range(). > >> (It would also link the same VMA multiple times in vdev->vma_list but > >> given the BUG_ON that is less serious.) > >> > >> Normally, however, this won't happen -- at least with > >> vfio_iommu_type1 -- the VFIO_IOMMU_MAP_DMA path is protected by > iommu->lock. > >> > >> If, however, we are using some kind of parallelization mechanism like > >> this one with ktask under discussion [1], we would hit this. > >> Even if we were doing this serially, given that vfio-pci remaps a > >> larger extent than strictly necessary it should internally enforce > >> coherence of its data structures. > >> > >> Handle this by using the VMA's presence in the vdev->vma_list as > >> indicative of a fully mapped VMA and returning success early to all > >> but the first VMA fault. Note that this is clearly optimstic given > >> that the mapping is ongoing, and might mean that the caller sees more > >> faults until the remap is done. > >> > >> [1] > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_ > >> linux- > >> 2Dmm_20181105145141.6f9937f6- > >> > 40w520.home_&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PAAlWswPe7d8gHl > >> > GbCLmy2YezyK7O3Hv_t2heGnouBw&m=3ZDXqnn9xNUCjgXwN9mHIKT7oyXu55P > >> U7yV2j0b-5hw&s=hiICkNtrcH4AbAWRrbkvMUylp7Bv0YHFCjxNGC6CGOk&e= > >> > >> Signed-off-by: Ankur Arora <ankur.a.ar...@oracle.com> > >> --- > >> drivers/vfio/pci/vfio_pci.c | 25 ++++++++++++++++++++++++- > >> 1 file changed, 24 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/vfio/pci/vfio_pci.c > >> b/drivers/vfio/pci/vfio_pci.c index 65e7e6b44578..b9f509863db1 100644 > >> --- a/drivers/vfio/pci/vfio_pci.c > >> +++ b/drivers/vfio/pci/vfio_pci.c > >> @@ -1573,6 +1573,11 @@ static int __vfio_pci_add_vma(struct > >> vfio_pci_device *vdev, > >> { > >> struct vfio_pci_mmap_vma *mmap_vma; > >> > >> + list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { > >> + if (mmap_vma->vma == vma) > >> + return 1; > >> + } > >> + > >> mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); > >> if (!mmap_vma) > >> return -ENOMEM; > >> @@ -1613,6 +1618,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct > >> vm_fault > >> *vmf) > >> struct vm_area_struct *vma = vmf->vma; > >> struct vfio_pci_device *vdev = vma->vm_private_data; > >> vm_fault_t ret = VM_FAULT_NOPAGE; > >> + int vma_present; > >> > >> mutex_lock(&vdev->vma_lock); > >> down_read(&vdev->memory_lock); > >> @@ -1623,7 +1629,21 @@ static vm_fault_t vfio_pci_mmap_fault(struct > >> vm_fault *vmf) > >> goto up_out; > >> } > >> > >> - if (__vfio_pci_add_vma(vdev, vma)) { > >> + /* > >> + * __vfio_pci_add_vma() either adds the vma to the vdev->vma_list > >> + * (vma_present == 0), or indicates that the vma is already present > >> + * on the list (vma_present == 1). > >> + * > >> + * Overload the meaning of this flag to also imply that the vma is > >> + * fully mapped. This allows us to serialize the mapping -- ensuring > >> + * that simultaneous faults will not both try to call > >> + * io_remap_pfn_range(). > >> + * > >> + * However, this might mean that callers to which we returned success > >> + * optimistically will see more faults until the remap is complete. > >> + */ > >> + vma_present = __vfio_pci_add_vma(vdev, vma); > >> + if (vma_present < 0) { > >> ret = VM_FAULT_OOM; > >> mutex_unlock(&vdev->vma_lock); > >> goto up_out; > >> @@ -1631,6 +1651,9 @@ static vm_fault_t vfio_pci_mmap_fault(struct > >> vm_fault > >> *vmf) > >> > >> mutex_unlock(&vdev->vma_lock); > >> > >> + if (vma_present) > >> + goto up_out; > >> + > >> if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, > >> vma->vm_end - vma->vm_start, vma- > >>> vm_page_prot)) > >> ret = VM_FAULT_SIGBUS; > >> -- > >> 2.29.2 > >