> -----Original Message-----
> From: Ye, Xiaolong
> Sent: Sunday, March 31, 2019 8:38 PM
> To: Olivier Matz <olivier.m...@6wind.com>
> Cc: dev@dpdk.org; David Marchand <david.march...@redhat.com>; Andrew
> Rybchenko <arybche...@solarflare.com>; Zhang, Qi Z <qi.z.zh...@intel.com>;
> Karlsson, Magnus <magnus.karls...@intel.com>; Topel, Bjorn
> <bjorn.to...@intel.com>; Maxime Coquelin <maxime.coque...@redhat.com>;
> Stephen Hemminger <step...@networkplumber.org>; Yigit, Ferruh
> <ferruh.yi...@intel.com>; Luca Boccassi <bl...@debian.org>; Richardson, Bruce
> <bruce.richard...@intel.com>; Ananyev, Konstantin
> <konstantin.anan...@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for
> buffer management
>
> Hi, Olivier
>
> Thanks for the comments.
>
> On 03/29, Olivier Matz wrote:
> >Hi Xiaolong,
> >
> >On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote:
> >> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
> >> allocated from rte_mempool can be converted to xdp_desc's address and
> >> vice versa.
> >>
> >> Signed-off-by: Xiaolong Ye <xiaolong...@intel.com>
> >> ---
> >> drivers/net/af_xdp/rte_eth_af_xdp.c | 117
> >> +++++++++++++++++-----------
> >> 1 file changed, 72 insertions(+), 45 deletions(-)
> >>
> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> index 47a496ed7..a1fda9212 100644
> >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
> >>
> >> #define ETH_AF_XDP_FRAME_SIZE
> XSK_UMEM__DEFAULT_FRAME_SIZE
> >> #define ETH_AF_XDP_NUM_BUFFERS 4096
> >> -#define ETH_AF_XDP_DATA_HEADROOM 0
> >> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
> >> +#define ETH_AF_XDP_MBUF_OVERHEAD 192
> >> +/* data start from offset 320 (192 + 128) bytes */
> >> +#define ETH_AF_XDP_DATA_HEADROOM \
> >> + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
> >
> >Having these constants looks quite dangerous too me. It imposes the
> >size of the mbuf, and the mempool header size. It would at least
> >require compilation checks.
> >
> >[...]
> >
> >> + umem->mb_pool =
> rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
> >> + ETH_AF_XDP_NUM_BUFFERS,
> >> + 250, 0,
> >> + ETH_AF_XDP_FRAME_SIZE -
> >> + ETH_AF_XDP_MBUF_OVERHEAD,
> >> + MEMPOOL_F_NO_SPREAD |
> MEMPOOL_CHUNK_F_PAGE_ALIGN,
> >> + SOCKET_ID_ANY);
> >> + if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1)
> {
> >> + AF_XDP_LOG(ERR, "Failed to create mempool\n");
> >> goto err;
> >> }
> >> + base_addr = (void *)get_base_addr(umem->mb_pool);
> >
> >Creating a mempool in the pmd like this does not look good to me for
> >several reasons:
> >- the user application creates its mempool with a specific private
> > area in its mbufs. Here there is no private area, so it will break
> > applications doing this.
> >- in patch 3 (mempool), you ensure that the chunk starts at a
> > page-aligned address, and you expect that given the other flags and
> > the constants at the top of the file, the data will be aligned. In
> > my opinion it is not ideal.
> >- the user application may create a large number of mbufs, for instance
> > if the application manages large reassembly queues, or tcp sockets.
> > Here the driver limits the number of mbufs to 4k per rx queue.
> >- the socket_id is any, so it won't be efficient on numa architectures.
>
> Our mbuf/mempool change regarding to zero copy does have limitations.
Just want to clarify, the above code is only reached for non-zero copy case.
here we create a private memory pool be used to manage AF_XDP umem, it's not
the Rx queues' s memory pool itself.
so I don't think there is concern on the private area and 4k per rx queue for
above code.
>
> >
> >May I suggest another idea?
> >
> >Allocate the xsk_umem almost like in patch 1, but using rte_memzone
> >allocation instead of posix_memalign() (and it will be faster, because
> >it will use hugepages). And do not allocate any mempool in the driver.
> >
>
> rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in my
> first patch.
>
> >When you receive a packet in the xsk_umem, allocate a new mbuf from the
> >standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the xsk
> >memory to the mbuf. You will have to register a callback to return the
> >xsk memory when the mbuf is transmitted or freed.
>
> I'll try to investigate how to implement it.
>
> >
> >This is, by the way, something I don't understand in your current
> >implementation: what happens if a mbuf is received in the af_xdp
> >driver, and freed by the application? How does the xsk buffer is returned?
>
> It is coordinated by the fill ring. The fill ring is used by the application
> ( user space)
> to send down addr for the kernel to fill in with Rx packet data.
> So for the free side, we just return it to the mempool, and each time in
> rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs to
> fill ring, that's how we return the xsk buffer to kernel.
>
> >
> >Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and
> >mempool, at the price of (maybe) decreasing the performance. But I think
> >there are some places where it can be optimized.
> >
> >I understand my feedback comes late -- as usual :( -- but if you are in
>
> Sorry for not Cc you in my patch set.
>
> >a hurry for RC1, maybe we can consider to put the 1st patch only, and
> >add the zero-copy mode in a second patch later. What do you think?
>
> Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first.
>
>
> Thanks,
> Xiaolong
>
> >
> >Regards,
> >Olivier
> >
> >