On 5/22/2024 2:25 AM, Du, Frank wrote: >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Wednesday, May 22, 2024 1:58 AM >> To: Du, Frank <frank...@intel.com>; dev@dpdk.org; Andrew Rybchenko >> <andrew.rybche...@oktetlabs.ru>; Morten Brørup >> <m...@smartsharesystems.com> >> Cc: Loftus, Ciara <ciara.lof...@intel.com>; Burakov, Anatoly >> <anatoly.bura...@intel.com> >> Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy >> >> On 5/11/2024 6:26 AM, Frank Du wrote: >>> The current calculation assumes that the mbufs are contiguous. >>> However, this assumption is incorrect when the memory spans across a huge >> page. >>> Correct to directly read the size from the mempool memory chunks. >>> >>> Signed-off-by: Frank Du <frank...@intel.com> >>> >>> --- >>> v2: >>> * Add virtual contiguous detect for for multiple memhdrs. >>> --- >>> drivers/net/af_xdp/rte_eth_af_xdp.c | 34 >>> ++++++++++++++++++++++++----- >>> 1 file changed, 28 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c >>> b/drivers/net/af_xdp/rte_eth_af_xdp.c >>> index 268a130c49..7456108d6d 100644 >>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c >>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c >>> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev >>> __rte_unused, } >>> >>> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG) >>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, >>> uint64_t *align) >>> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, >>> +uint64_t *align, size_t *len) >>> { >>> - struct rte_mempool_memhdr *memhdr; >>> + struct rte_mempool_memhdr *memhdr, *next; >>> uintptr_t memhdr_addr, aligned_addr; >>> + size_t memhdr_len = 0; >>> >>> + /* get the mempool base addr and align */ >>> memhdr = STAILQ_FIRST(&mp->mem_list); >>> memhdr_addr = (uintptr_t)memhdr->addr; >>> aligned_addr = memhdr_addr & ~(getpagesize() - 1); >>> *align = memhdr_addr - aligned_addr; >>> >> >> I am aware this is not part of this patch, but as note, can't we use >> 'RTE_ALIGN_FLOOR' to calculate aligned address. > > Sure, will use RTE_ALIGN_FLOOR in next version. > >> >> >>> + memhdr_len += memhdr->len; >>> + >>> + /* check if virtual contiguous memory for multiple memhdrs */ >>> + next = STAILQ_NEXT(memhdr, next); >>> + while (next != NULL) { >>> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr- >>> len) { >>> + AF_XDP_LOG(ERR, "memory chunks not virtual >> contiguous, " >>> + "next: %p, cur: %p(len: %" PRId64 >> " )\n", >>> + next->addr, memhdr->addr, memhdr- >>> len); >>> + return 0; >>> + } >>> >> >> Isn't there a mempool flag that can help us figure out mempool is not IOVA >> contiguous? Isn't it sufficient on its own? > > Indeed, what we need to ascertain is whether it's contiguous in CPU virtual > space, not IOVA. I haven't come across a flag specifically for CPU virtual > contiguity. The major limitation in XDP is XSK UMEM only supports registering > a single contiguous virtual memory area. >
'RTE_MEMPOOL_F_NO_IOVA_CONTIG' is the flag I was looking for. This flag being *cleared* implies IOVA contiguous but not sure if it is guaranteed, need to check. And I may be wrong here, but as far as I remember in IOVA as VA mode, process virtual address and IOVA address are same, so IOVA contiguous is same as contiguous CPU virtual address. >> >> >>> + /* virtual contiguous */ >>> + memhdr = next; >>> + memhdr_len += memhdr->len; >>> + next = STAILQ_NEXT(memhdr, next); >>> + } >>> >>> + *len = memhdr_len; >>> return aligned_addr; >>> } >>> >> >> This function goes too much details of the mempool object, and any change in >> mempool details has potential to break this code. >> >> @Andrew, @Morten, do you think does it make sense to have >> 'rte_mempool_info_get()' kind of function, that provides at least address and >> length of the mempool, and used here? >> >> This helps to hide internal details and complexity of the mempool for users. >> >> >>> >>> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct >> pmd_internals *internals, >>> void *base_addr = NULL; >>> struct rte_mempool *mb_pool = rxq->mb_pool; >>> uint64_t umem_size, align = 0; >>> + size_t len = 0; >>> >>> if (internals->shared_umem) { >>> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@ >>> -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct >> pmd_internals *internals, >>> } >>> >>> umem->mb_pool = mb_pool; >>> - base_addr = (void *)get_base_addr(mb_pool, &align); >>> - umem_size = (uint64_t)mb_pool->populated_size * >>> - (uint64_t)usr_config.frame_size + >>> - align; >>> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len); >>> >> >> Is this calculation correct if mempool is not already aligned to page size? >> >> Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080" >> returned aligned address is '0x000a1000', "base_addr = 0x000a1000" >> >> Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected? > > Yes, since the XSK UMEM memory area requires page alignment. However, no need > to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the > mbuf data area. We don’t have any chance to access the invalid range > [0x000a1000: 0x000a1080] here. > Thanks for clarification.