Hi Nipun, I'd like to appreciate your time reading this email.
Our QA team found that since this commit "a399d7b5a994: do not coalesce DMA mappings" is introduced, the dpdk testpmd start with "--no-huge" parameters will failed, and shows "EAL: Cannot set up DMA remapping, error 28 (No space left on device)". So they reported it on dpdk Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1235. I understand this feature is to keep consistent with the kernel and not allow memory segments be merged. The side effect is the testpmd with "--no-huge" parameters will not be able to start because the too many pages will exceed the capability of IOMMU. Is it expected? Should we remove the --no-huge" in our testcase? Regards, Xuan > -----Original Message----- > From: Nipun Gupta <nipun.gu...@amd.com> > Sent: Friday, December 30, 2022 5:59 PM > To: dev@dpdk.org; tho...@monjalon.net; Burakov, Anatoly > <anatoly.bura...@intel.com>; ferruh.yi...@amd.com > Cc: nikhil.agar...@amd.com; Nipun Gupta <nipun.gu...@amd.com> > Subject: [PATCH] vfio: do not coalesce DMA mappings > > At the cleanup time when dma unmap is done, linux kernel does not allow > unmap of individual segments which were coalesced together while creating > the DMA map for type1 IOMMU mappings. So, this change updates the > mapping of the memory > segments(hugepages) on a per-page basis. > > Signed-off-by: Nipun Gupta <nipun.gu...@amd.com> > --- > > When hotplug of devices is used, multiple pages gets colaeced and a single > mapping gets created for these pages (using APIs > rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time > when the memory is released, the VFIO does not cleans up that memory and > following error is observed in the eal for 2MB > hugepages: > EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152 > > This is because VFIO does not clear the DMA (refer API > vfio_dma_do_unmap() - > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1. > c#L1330), > where it checks the dma mapping where it checks for IOVA to free: > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1. > c#L1418. > > Thus this change updates the mapping to be created individually instead of > colaecing them. > > lib/eal/linux/eal_vfio.c | 29 ----------------------------- > 1 file changed, 29 deletions(-) > > diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index > 549b86ae1d..56edccb0db 100644 > --- a/lib/eal/linux/eal_vfio.c > +++ b/lib/eal/linux/eal_vfio.c > @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base, > return 1; > } > > -static int > -type1_map_contig(const struct rte_memseg_list *msl, const struct > rte_memseg *ms, > - size_t len, void *arg) > -{ > - int *vfio_container_fd = arg; > - > - if (msl->external) > - return 0; > - > - return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, > ms->iova, > - len, 1); > -} > - > static int > type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms, > void *arg) > @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl, > const struct rte_memseg *ms, > if (ms->iova == RTE_BAD_IOVA) > return 0; > > - /* if IOVA mode is VA, we've already mapped the internal segments */ > - if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA) > - return 0; > - > return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, > ms->iova, > ms->len, 1); > } > @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd, > uint64_t vaddr, uint64_t iova, static int vfio_type1_dma_map(int > vfio_container_fd) { > - if (rte_eal_iova_mode() == RTE_IOVA_VA) { > - /* with IOVA as VA mode, we can get away with mapping > contiguous > - * chunks rather than going page-by-page. > - */ > - int ret = rte_memseg_contig_walk(type1_map_contig, > - &vfio_container_fd); > - if (ret) > - return ret; > - /* we have to continue the walk because we've skipped the > - * external segments during the config walk. > - */ > - } > return rte_memseg_walk(type1_map, &vfio_container_fd); } > > -- > 2.25.1