Hi Xuan, Please see inline.
> -----Original Message----- > From: Ding, Xuan <xuan.d...@intel.com> > Sent: Tuesday, July 4, 2023 10:43 AM > To: Gupta, Nipun <nipun.gu...@amd.com>; dev@dpdk.org; > tho...@monjalon.net; Burakov, Anatoly <anatoly.bura...@intel.com>; Yigit, > Ferruh <ferruh.yi...@amd.com> > Cc: Agarwal, Nikhil <nikhil.agar...@amd.com>; He, Xingguang > <xingguang...@intel.com>; Ling, WeiX <weix.l...@intel.com> > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings > > Hi Nipun, > > > -----Original Message----- > > From: Ding, Xuan > > Sent: Friday, June 30, 2023 1:58 PM > > To: Nipun Gupta <nipun.gu...@amd.com>; dev@dpdk.org; > > tho...@monjalon.net; Burakov, Anatoly <anatoly.bura...@intel.com>; > > ferruh.yi...@amd.com > > Cc: nikhil.agar...@amd.com; He, Xingguang <xingguang...@intel.com>; Ling, > > WeiX <weix.l...@intel.com> > > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings > > > > Hi Nipun, > > > > Replies are inline. > > > > > -----Original Message----- > > > From: Nipun Gupta <nipun.gu...@amd.com> > > > Sent: Friday, June 30, 2023 9:46 AM > > > To: Ding, Xuan <xuan.d...@intel.com>; dev@dpdk.org; > > > tho...@monjalon.net; Burakov, Anatoly <anatoly.bura...@intel.com>; > > > ferruh.yi...@amd.com > > > Cc: nikhil.agar...@amd.com; He, Xingguang <xingguang...@intel.com>; > > > Ling, WeiX <weix.l...@intel.com> > > > Subject: Re: [PATCH] vfio: do not coalesce DMA mappings > > > > > > Hi Xuan, > > > > > > Thanks for pointing out the issue and figuring out the patch which > > > introduced this. If you have answers to below queries, please let me know: > > > > > > Is there any other test cases which tests "--no-huge" which pass? > > > > Yes, there are test cases adding "--no-huge" option to validate 4k page > > size in > > async vhost. > > Actually, the page size is decided by front-end, so I think this case can be > > removed. > > > > Previously, testpmd can start with "--no-huge" options (not sure if there > > are > > test cases). > > Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i > > > > > > > > Also, if we change the "-m" option to provide lower memory, does the > > > test pass? > > > > "-m" option is also added and does not work. > > > > > > > > When you mention too many pages exceed the capability of IOMMU, you > > > are referring to HW capability to create multiple pages? Here it seems > > > in case of 4K page size we need 256K pages which is limiting the capacity? > > > > Yes, this is the result of my initial debugging. > > The direct impact is that this kind of testpmd cases cannot start now. > > If this is expected, I think we can close this defect and ignore the > > "--no-huge" > > option when start. > > Any insights? Should we just ignore the "--no-huge" option and close this > defect? > Now we did this as a workaround. Seems no one uses the "--no-huge" option in > testpmd now. VFIO supports dma_entry_limit as a module parameter, which has a default value of U16_MAX i.e. 64K, most likely which is limiting creation of 256K entries for 4K pages here. This can be modified while inserting vfio module: modprobe vfio_iommu_type1 dma_entry_limit=1000000 You can update the test case with updating this limit, and test case shall pass. Thanks, Nipun > > Thanks, > Xuan > > > > > Regards, > > Xuan > > > > > > > > Regards, > > > Nipun > > > > > > On 6/29/2023 1:51 PM, Ding, Xuan wrote: > > > > Hi Nipun, > > > > > > > > I'd like to appreciate your time reading this email. > > > > > > > > Our QA team found that since this commit "a399d7b5a994: do not > > > > coalesce DMA mappings" is introduced, the dpdk testpmd start with > > > > "--no- > > > huge" parameters will failed, and shows "EAL: Cannot set up DMA > > > remapping, error 28 (No space left on device)". > > > > So they reported it on dpdk Bugzilla: > > > https://bugs.dpdk.org/show_bug.cgi?id=1235. > > > > > > > > I understand this feature is to keep consistent with the kernel and > > > > not allow > > > memory segments be merged. > > > > The side effect is the testpmd with "--no-huge" parameters will not > > > > be able > > > to start because the too many pages will exceed the capability of IOMMU. > > > > Is it expected? Should we remove the --no-huge" in our testcase? > > > > > > > > Regards, > > > > Xuan > > > > > > > >> -----Original Message----- > > > >> From: Nipun Gupta <nipun.gu...@amd.com> > > > >> Sent: Friday, December 30, 2022 5:59 PM > > > >> To: dev@dpdk.org; tho...@monjalon.net; Burakov, Anatoly > > > >> <anatoly.bura...@intel.com>; ferruh.yi...@amd.com > > > >> Cc: nikhil.agar...@amd.com; Nipun Gupta <nipun.gu...@amd.com> > > > >> Subject: [PATCH] vfio: do not coalesce DMA mappings > > > >> > > > >> At the cleanup time when dma unmap is done, linux kernel does not > > > >> allow unmap of individual segments which were coalesced together > > > >> while creating the DMA map for type1 IOMMU mappings. So, this > > > >> change updates the mapping of the memory > > > >> segments(hugepages) on a per-page basis. > > > >> > > > >> Signed-off-by: Nipun Gupta <nipun.gu...@amd.com> > > > >> --- > > > >> > > > >> When hotplug of devices is used, multiple pages gets colaeced and a > > > >> single mapping gets created for these pages (using APIs > > > >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup > > > >> time when the memory is released, the VFIO does not cleans up that > > > >> memory and following error is observed in the eal for 2MB > > > >> hugepages: > > > >> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152 > > > >> > > > >> This is because VFIO does not clear the DMA (refer API > > > >> vfio_dma_do_unmap() - > > > >> > > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu > > > _type > > > 1. > > > >> c#L1330), > > > >> where it checks the dma mapping where it checks for IOVA to free: > > > >> > > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu > > > _type > > > 1. > > > >> c#L1418. > > > >> > > > >> Thus this change updates the mapping to be created individually > > > >> instead of colaecing them. > > > >> > > > >> lib/eal/linux/eal_vfio.c | 29 ----------------------------- > > > >> 1 file changed, 29 deletions(-) > > > >> > > > >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c > > > >> index 549b86ae1d..56edccb0db 100644 > > > >> --- a/lib/eal/linux/eal_vfio.c > > > >> +++ b/lib/eal/linux/eal_vfio.c > > > >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char > > > *sysfs_base, > > > >> return 1; > > > >> } > > > >> > > > >> -static int > > > >> -type1_map_contig(const struct rte_memseg_list *msl, const struct > > > >> rte_memseg *ms, > > > >> - size_t len, void *arg) > > > >> -{ > > > >> - int *vfio_container_fd = arg; > > > >> - > > > >> - if (msl->external) > > > >> - return 0; > > > >> - > > > >> - return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, > > > >> ms->iova, > > > >> - len, 1); > > > >> -} > > > >> - > > > >> static int > > > >> type1_map(const struct rte_memseg_list *msl, const struct > > > >> rte_memseg > > > *ms, > > > >> void *arg) > > > >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl, > > > >> const struct rte_memseg *ms, > > > >> if (ms->iova == RTE_BAD_IOVA) > > > >> return 0; > > > >> > > > >> - /* if IOVA mode is VA, we've already mapped the internal > > > >> segments > > > */ > > > >> - if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA) > > > >> - return 0; > > > >> - > > > >> return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, > > > >> ms->iova, > > > >> ms->len, 1); > > > >> } > > > >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int > > > vfio_container_fd, > > > >> uint64_t vaddr, uint64_t iova, static int vfio_type1_dma_map(int > > > >> vfio_container_fd) { > > > >> - if (rte_eal_iova_mode() == RTE_IOVA_VA) { > > > >> - /* with IOVA as VA mode, we can get away with mapping > > > >> contiguous > > > >> - * chunks rather than going page-by-page. > > > >> - */ > > > >> - int ret = rte_memseg_contig_walk(type1_map_contig, > > > >> - &vfio_container_fd); > > > >> - if (ret) > > > >> - return ret; > > > >> - /* we have to continue the walk because we've skipped > > > >> the > > > >> - * external segments during the config walk. > > > >> - */ > > > >> - } > > > >> return rte_memseg_walk(type1_map, &vfio_container_fd); } > > > >> > > > >> -- > > > >> 2.25.1 > > > >