> From: Du, Frank [mailto:frank...@intel.com] > Sent: Wednesday, 22 May 2024 03.25 > > > From: Ferruh Yigit <ferruh.yi...@amd.com> > > Sent: Wednesday, May 22, 2024 1:58 AM > > > > On 5/11/2024 6:26 AM, Frank Du wrote: > > > The current calculation assumes that the mbufs are contiguous. > > > However, this assumption is incorrect when the memory spans across a huge > > page. > > > Correct to directly read the size from the mempool memory chunks. > > > > > > Signed-off-by: Frank Du <frank...@intel.com> > > > > > > --- > > > v2: > > > * Add virtual contiguous detect for for multiple memhdrs. > > > --- > > > drivers/net/af_xdp/rte_eth_af_xdp.c | 34 > > > ++++++++++++++++++++++++----- > > > 1 file changed, 28 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c > > > index 268a130c49..7456108d6d 100644 > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev > > > __rte_unused, } > > > > > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG) > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp, > > > uint64_t *align) > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, > > > +uint64_t *align, size_t *len) > > > { > > > - struct rte_mempool_memhdr *memhdr; > > > + struct rte_mempool_memhdr *memhdr, *next; > > > uintptr_t memhdr_addr, aligned_addr; > > > + size_t memhdr_len = 0; > > > > > > + /* get the mempool base addr and align */ > > > memhdr = STAILQ_FIRST(&mp->mem_list); > > > memhdr_addr = (uintptr_t)memhdr->addr;
This is not a new bug; but if the mempool is not populated, memhdr is NULL here. > > > aligned_addr = memhdr_addr & ~(getpagesize() - 1); > > > *align = memhdr_addr - aligned_addr; > > > > > > > I am aware this is not part of this patch, but as note, can't we use > > 'RTE_ALIGN_FLOOR' to calculate aligned address. > > Sure, will use RTE_ALIGN_FLOOR in next version. > > > > > > > > + memhdr_len += memhdr->len; > > > + > > > + /* check if virtual contiguous memory for multiple memhdrs */ > > > + next = STAILQ_NEXT(memhdr, next); > > > + while (next != NULL) { > > > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr- > > >len) { > > > + AF_XDP_LOG(ERR, "memory chunks not virtual > > contiguous, " > > > + "next: %p, cur: %p(len: %" PRId64 > > " )\n", > > > + next->addr, memhdr->addr, memhdr- > > >len); > > > + return 0; > > > + } > > > > > > > Isn't there a mempool flag that can help us figure out mempool is not IOVA > > contiguous? Isn't it sufficient on its own? > > Indeed, what we need to ascertain is whether it's contiguous in CPU virtual > space, not IOVA. I haven't come across a flag specifically for CPU virtual > contiguity. The major limitation in XDP is XSK UMEM only supports registering > a single contiguous virtual memory area. I would assume that the EAL memory manager merges free memory into contiguous chunks whenever possible. @Anatoly, please confirm? If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then the mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it is; there is no need to iterate through the memhdr list. > > > > > > > > + /* virtual contiguous */ > > > + memhdr = next; > > > + memhdr_len += memhdr->len; > > > + next = STAILQ_NEXT(memhdr, next); > > > + } > > > > > > + *len = memhdr_len; > > > return aligned_addr; > > > } > > > > > > > This function goes too much details of the mempool object, and any change in > > mempool details has potential to break this code. > > > > @Andrew, @Morten, do you think does it make sense to have > > 'rte_mempool_info_get()' kind of function, that provides at least address > and > > length of the mempool, and used here? > > > > This helps to hide internal details and complexity of the mempool for users. I think all the relevant information is available as (public) fields directly in the rte_mempool. I agree about hiding internal details. For discriminating between private and public information, I would prefer marking the "private" fields in the rte_mempool structure as such. Optimally we need an rte_mempool_create() flag, specifying that the mempool objects must be allocated as one chunk of memory with contiguous virtual addresses when populating the mempool. As discussed in another thread [1], the proposed pointer compression library would also benefit from such a mempool flag. [1] https://inbox.dpdk.org/dev/98cbd80474fa8b44bf855df32c47dc35e9f...@smartserver.smartshare.dk/ > > > > > > > > > > @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct > > pmd_internals *internals, > > > void *base_addr = NULL; > > > struct rte_mempool *mb_pool = rxq->mb_pool; > > > uint64_t umem_size, align = 0; > > > + size_t len = 0; > > > > > > if (internals->shared_umem) { > > > if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@ > > > -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct > > pmd_internals *internals, > > > } > > > > > > umem->mb_pool = mb_pool; > > > - base_addr = (void *)get_base_addr(mb_pool, &align); > > > - umem_size = (uint64_t)mb_pool->populated_size * > > > - (uint64_t)usr_config.frame_size + > > > - align; > > > + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len); > > > > > > > Is this calculation correct if mempool is not already aligned to page size? Please note: The mempool uses one memzone for the mempool structure itself. The objects in the mempool are stored in another memzone (or multiple other memzones, if necessary). I think you are talking about the alignment of the mempool object chunk, not of the mempool structure itself. > > > > Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080" > > returned aligned address is '0x000a1000', "base_addr = 0x000a1000" > > > > Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected? > > Yes, since the XSK UMEM memory area requires page alignment. However, no need > to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the > mbuf data area. We don’t have any chance to access the invalid range > [0x000a1000: 0x000a1080] here. > > > > > > > > + if (!base_addr) { > > > + AF_XDP_LOG(ERR, "Failed to parse memhdr info from > > pool\n"); > > > > > > > Log message is not accurate, it is not parsing memhdr info failed, but > mempool > > was not satisfying expectation. Looking at get_memhdr_info() above, it could be either mempool or memhdr failing to parse. > > Thanks, will correct it in next version. > > > > > > + goto err; > > > + } > > > + umem_size = (uint64_t)len + align; > > > > > > ret = xsk_umem__create(&umem->umem, base_addr, > > umem_size, > > > &rxq->fq, &rxq->cq, &usr_config);