Hi Xuan,

Please see inline.

> -----Original Message-----
> From: Ding, Xuan <xuan.d...@intel.com>
> Sent: Tuesday, July 4, 2023 10:43 AM
> To: Gupta, Nipun <nipun.gu...@amd.com>; dev@dpdk.org;
> tho...@monjalon.net; Burakov, Anatoly <anatoly.bura...@intel.com>; Yigit,
> Ferruh <ferruh.yi...@amd.com>
> Cc: Agarwal, Nikhil <nikhil.agar...@amd.com>; He, Xingguang
> <xingguang...@intel.com>; Ling, WeiX <weix.l...@intel.com>
> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Nipun,
> 
> > -----Original Message-----
> > From: Ding, Xuan
> > Sent: Friday, June 30, 2023 1:58 PM
> > To: Nipun Gupta <nipun.gu...@amd.com>; dev@dpdk.org;
> > tho...@monjalon.net; Burakov, Anatoly <anatoly.bura...@intel.com>;
> > ferruh.yi...@amd.com
> > Cc: nikhil.agar...@amd.com; He, Xingguang <xingguang...@intel.com>; Ling,
> > WeiX <weix.l...@intel.com>
> > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >
> > Hi Nipun,
> >
> > Replies are inline.
> >
> > > -----Original Message-----
> > > From: Nipun Gupta <nipun.gu...@amd.com>
> > > Sent: Friday, June 30, 2023 9:46 AM
> > > To: Ding, Xuan <xuan.d...@intel.com>; dev@dpdk.org;
> > > tho...@monjalon.net; Burakov, Anatoly <anatoly.bura...@intel.com>;
> > > ferruh.yi...@amd.com
> > > Cc: nikhil.agar...@amd.com; He, Xingguang <xingguang...@intel.com>;
> > > Ling, WeiX <weix.l...@intel.com>
> > > Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> > >
> > > Hi Xuan,
> > >
> > > Thanks for pointing out the issue and figuring out the patch which
> > > introduced this. If you have answers to below queries, please let me know:
> > >
> > > Is there any other test cases which tests "--no-huge" which pass?
> >
> > Yes, there are test cases adding "--no-huge" option to validate 4k page 
> > size in
> > async vhost.
> > Actually, the page size is decided by front-end, so I think this case can be
> > removed.
> >
> > Previously, testpmd can start with "--no-huge" options (not sure if there 
> > are
> > test cases).
> > Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> >
> > >
> > > Also, if we change the "-m" option to provide lower memory, does the
> > > test pass?
> >
> > "-m" option is also added and does not work.
> >
> > >
> > > When you mention too many pages exceed the capability of IOMMU, you
> > > are referring to HW capability to create multiple pages? Here it seems
> > > in case of 4K page size we need 256K pages which is limiting the capacity?
> >
> > Yes, this is the result of my initial debugging.
> > The direct impact is that this kind of testpmd cases cannot start now.
> > If this is expected, I think we can close this defect and ignore the 
> > "--no-huge"
> > option when start.
> 
> Any insights? Should we just ignore the "--no-huge" option and close this 
> defect?
> Now we did this as a workaround. Seems no one uses the "--no-huge" option in
> testpmd now.

VFIO supports dma_entry_limit as a module parameter, which has a default value 
of
U16_MAX i.e. 64K, most likely which is limiting creation of 256K entries for 4K 
pages
here. This can be modified while inserting vfio module:
        modprobe vfio_iommu_type1 dma_entry_limit=1000000

You can update the test case with updating this limit, and test case shall pass.

Thanks,
Nipun

> 
> Thanks,
> Xuan
> 
> >
> > Regards,
> > Xuan
> >
> > >
> > > Regards,
> > > Nipun
> > >
> > > On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > > > Hi Nipun,
> > > >
> > > > I'd like to appreciate your time reading this email.
> > > >
> > > > Our QA team found that since this commit "a399d7b5a994: do not
> > > > coalesce DMA mappings" is introduced, the dpdk testpmd start with
> > > > "--no-
> > > huge" parameters will failed, and shows "EAL: Cannot set up DMA
> > > remapping, error 28 (No space left on device)".
> > > > So they reported it on dpdk Bugzilla:
> > > https://bugs.dpdk.org/show_bug.cgi?id=1235.
> > > >
> > > > I understand this feature is to keep consistent with the kernel and
> > > > not allow
> > > memory segments be merged.
> > > > The side effect is the testpmd with "--no-huge" parameters will not
> > > > be able
> > > to start because the too many pages will exceed the capability of IOMMU.
> > > > Is it expected? Should we remove the --no-huge" in our testcase?
> > > >
> > > > Regards,
> > > > Xuan
> > > >
> > > >> -----Original Message-----
> > > >> From: Nipun Gupta <nipun.gu...@amd.com>
> > > >> Sent: Friday, December 30, 2022 5:59 PM
> > > >> To: dev@dpdk.org; tho...@monjalon.net; Burakov, Anatoly
> > > >> <anatoly.bura...@intel.com>; ferruh.yi...@amd.com
> > > >> Cc: nikhil.agar...@amd.com; Nipun Gupta <nipun.gu...@amd.com>
> > > >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> > > >>
> > > >> At the cleanup time when dma unmap is done, linux kernel does not
> > > >> allow unmap of individual segments which were coalesced together
> > > >> while creating the DMA map for type1 IOMMU mappings. So, this
> > > >> change updates the mapping of the memory
> > > >> segments(hugepages) on a per-page basis.
> > > >>
> > > >> Signed-off-by: Nipun Gupta <nipun.gu...@amd.com>
> > > >> ---
> > > >>
> > > >> When hotplug of devices is used, multiple pages gets colaeced and a
> > > >> single mapping gets created for these pages (using APIs
> > > >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup
> > > >> time when the memory is released, the VFIO does not cleans up that
> > > >> memory and following error is observed in the eal for 2MB
> > > >> hugepages:
> > > >> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
> > > >>
> > > >> This is because VFIO does not clear the DMA (refer API
> > > >> vfio_dma_do_unmap() -
> > > >>
> > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu
> > > _type
> > > 1.
> > > >> c#L1330),
> > > >> where it checks the dma mapping where it checks for IOVA to free:
> > > >>
> > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu
> > > _type
> > > 1.
> > > >> c#L1418.
> > > >>
> > > >> Thus this change updates the mapping to be created individually
> > > >> instead of colaecing them.
> > > >>
> > > >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> > > >>   1 file changed, 29 deletions(-)
> > > >>
> > > >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
> > > >> index 549b86ae1d..56edccb0db 100644
> > > >> --- a/lib/eal/linux/eal_vfio.c
> > > >> +++ b/lib/eal/linux/eal_vfio.c
> > > >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> > > *sysfs_base,
> > > >>        return 1;
> > > >>   }
> > > >>
> > > >> -static int
> > > >> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> > > >> rte_memseg *ms,
> > > >> -              size_t len, void *arg)
> > > >> -{
> > > >> -      int *vfio_container_fd = arg;
> > > >> -
> > > >> -      if (msl->external)
> > > >> -              return 0;
> > > >> -
> > > >> -      return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> > > >> ms->iova,
> > > >> -                      len, 1);
> > > >> -}
> > > >> -
> > > >>   static int
> > > >>   type1_map(const struct rte_memseg_list *msl, const struct
> > > >> rte_memseg
> > > *ms,
> > > >>                void *arg)
> > > >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> > > >> const struct rte_memseg *ms,
> > > >>        if (ms->iova == RTE_BAD_IOVA)
> > > >>                return 0;
> > > >>
> > > >> -      /* if IOVA mode is VA, we've already mapped the internal 
> > > >> segments
> > > */
> > > >> -      if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> > > >> -              return 0;
> > > >> -
> > > >>        return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> > > >> ms->iova,
> > > >>                        ms->len, 1);
> > > >>   }
> > > >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> > > vfio_container_fd,
> > > >> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> > > >> vfio_container_fd)  {
> > > >> -      if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> > > >> -              /* with IOVA as VA mode, we can get away with mapping
> > > >> contiguous
> > > >> -               * chunks rather than going page-by-page.
> > > >> -               */
> > > >> -              int ret = rte_memseg_contig_walk(type1_map_contig,
> > > >> -                              &vfio_container_fd);
> > > >> -              if (ret)
> > > >> -                      return ret;
> > > >> -              /* we have to continue the walk because we've skipped 
> > > >> the
> > > >> -               * external segments during the config walk.
> > > >> -               */
> > > >> -      }
> > > >>        return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> > > >>
> > > >> --
> > > >> 2.25.1
> > > >

Reply via email to