RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy

Morten Brørup Wed, 22 May 2024 00:26:43 -0700

> From: Du, Frank [mailto:frank...@intel.com]
> Sent: Wednesday, 22 May 2024 03.25
> 
> > From: Ferruh Yigit <ferruh.yi...@amd.com>
> > Sent: Wednesday, May 22, 2024 1:58 AM
> >
> > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > The current calculation assumes that the mbufs are contiguous.
> > > However, this assumption is incorrect when the memory spans across a huge
> > page.
> > > Correct to directly read the size from the mempool memory chunks.
> > >
> > > Signed-off-by: Frank Du <frank...@intel.com>
> > >
> > > ---
> > > v2:
> > > * Add virtual contiguous detect for for multiple memhdrs.
> > > ---
> > >  drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > ++++++++++++++++++++++++-----
> > >  1 file changed, 28 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > index 268a130c49..7456108d6d 100644
> > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > > __rte_unused,  }
> > >
> > >  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > uint64_t *align)
> > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > > +uint64_t *align, size_t *len)
> > >  {
> > > - struct rte_mempool_memhdr *memhdr;
> > > + struct rte_mempool_memhdr *memhdr, *next;
> > >   uintptr_t memhdr_addr, aligned_addr;
> > > + size_t memhdr_len = 0;
> > >
> > > + /* get the mempool base addr and align */
> > >   memhdr = STAILQ_FIRST(&mp->mem_list);
> > >   memhdr_addr = (uintptr_t)memhdr->addr;


This is not a new bug; but if the mempool is not populated, memhdr is NULL here.

> > >   aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > >   *align = memhdr_addr - aligned_addr;
> > >
> >
> > I am aware this is not part of this patch, but as note, can't we use
> > 'RTE_ALIGN_FLOOR' to calculate aligned address.
> 
> Sure, will use RTE_ALIGN_FLOOR in next version.
> 
> >
> >
> > > + memhdr_len += memhdr->len;
> > > +
> > > + /* check if virtual contiguous memory for multiple memhdrs */
> > > + next = STAILQ_NEXT(memhdr, next);
> > > + while (next != NULL) {
> > > +         if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
> > >len) {
> > > +                 AF_XDP_LOG(ERR, "memory chunks not virtual
> > contiguous, "
> > > +                                 "next: %p, cur: %p(len: %" PRId64
> > " )\n",
> > > +                                 next->addr, memhdr->addr, memhdr-
> > >len);
> > > +                 return 0;
> > > +         }
> > >
> >
> > Isn't there a mempool flag that can help us figure out mempool is not IOVA
> > contiguous? Isn't it sufficient on its own?
> 
> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual
> space, not IOVA. I haven't come across a flag specifically for CPU virtual
> contiguity. The major limitation in XDP is XSK UMEM only supports registering
> a single contiguous virtual memory area.

I would assume that the EAL memory manager merges free memory into contiguous 
chunks whenever possible.
@Anatoly, please confirm?

If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then the 
mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it is; 
there is no need to iterate through the memhdr list.

> 
> >
> >
> > > +         /* virtual contiguous */
> > > +         memhdr = next;
> > > +         memhdr_len += memhdr->len;
> > > +         next = STAILQ_NEXT(memhdr, next);
> > > + }
> > >
> > > + *len = memhdr_len;
> > >   return aligned_addr;
> > >  }
> > >
> >
> > This function goes too much details of the mempool object, and any change in
> > mempool details has potential to break this code.
> >
> > @Andrew, @Morten, do you think does it make sense to have
> > 'rte_mempool_info_get()' kind of function, that provides at least address
> and
> > length of the mempool, and used here?
> >
> > This helps to hide internal details and complexity of the mempool for users.

I think all the relevant information is available as (public) fields directly 
in the rte_mempool.
I agree about hiding internal details. For discriminating between private and 
public information, I would prefer marking the "private" fields in the 
rte_mempool structure as such.

Optimally we need an rte_mempool_create() flag, specifying that the mempool 
objects must be allocated as one chunk of memory with contiguous virtual 
addresses when populating the mempool. As discussed in another thread [1], the 
proposed pointer compression library would also benefit from such a mempool 
flag.

[1] 
https://inbox.dpdk.org/dev/98cbd80474fa8b44bf855df32c47dc35e9f...@smartserver.smartshare.dk/

> >
> >
> > >
> > > @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > >   void *base_addr = NULL;
> > >   struct rte_mempool *mb_pool = rxq->mb_pool;
> > >   uint64_t umem_size, align = 0;
> > > + size_t len = 0;
> > >
> > >   if (internals->shared_umem) {
> > >           if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > > -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > >           }
> > >
> > >           umem->mb_pool = mb_pool;
> > > -         base_addr = (void *)get_base_addr(mb_pool, &align);
> > > -         umem_size = (uint64_t)mb_pool->populated_size *
> > > -                         (uint64_t)usr_config.frame_size +
> > > -                         align;
> > > +         base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> > >
> >
> > Is this calculation correct if mempool is not already aligned to page size?

Please note: The mempool uses one memzone for the mempool structure itself. The 
objects in the mempool are stored in another memzone (or multiple other 
memzones, if necessary). I think you are talking about the alignment of the 
mempool object chunk, not of the mempool structure itself.

> >
> > Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
> > returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
> >
> > Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?
> 
> Yes, since the XSK UMEM memory area requires page alignment. However, no need
> to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the
> mbuf data area. We don’t have any chance to access the invalid range
> [0x000a1000: 0x000a1080] here.
> 
> >
> >
> > > +         if (!base_addr) {
> > > +                 AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> > pool\n");
> > >
> >
> > Log message is not accurate, it is not parsing memhdr info failed, but
> mempool
> > was not satisfying expectation.

Looking at get_memhdr_info() above, it could be either mempool or memhdr 
failing to parse.

> 
> Thanks, will correct it in next version.
> 
> >
> > > +                 goto err;
> > > +         }
> > > +         umem_size = (uint64_t)len + align;
> > >
> > >           ret = xsk_umem__create(&umem->umem, base_addr,
> > umem_size,
> > >                           &rxq->fq, &rxq->cq, &usr_config);

RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy

Reply via email to