> -----Original Message-----
> From: Loftus, Ciara [mailto:ciara.lof...@intel.com]
> Sent: Friday, October 02, 2020 12:24 AM
> To: Li,Rongqing <lirongq...@baidu.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH][v2] net/af_xdp: avoid to unnecessary allocation and free
> mbuf in rx path
>
> >
> > when receive packets, the max bunch number of mbuf are allocated if
> > hardware does not receive the max bunch number packets, it will free
> > redundancy mbuf, that is low-performance
> >
> > so optimize rx performance, by allocating number of mbuf based on
> > result of xsk_ring_cons__peek, to avoid to redundancy allocation, and
> > free mbuf when receive packets
>
> Hi,
>
> Thanks for the patch and fixing the issue I raised.
Thanks for your finding
> With my testing so far I haven't measured an improvement in performance
> with the patch.
> Do you have data to share which shows the benefit of your patch?
>
> I agree the potential excess allocation of mbufs for the fill ring is not the
> most
> optimal, but if doing it does not significantly impact the performance I
> would be
> in favour of keeping that approach versus touching the cached_cons outside of
> libbpf which is unconventional.
>
> If a benefit can be shown and we proceed with the approach, I would suggest
> creating a new function for the cached consumer rollback eg.
> xsk_ring_cons_cancel() or similar, and add a comment describing what it does.
>
Thanks for your test.
Yes, it has benefit
We first see this issue when do some send performance, topo is like below
Qemu with vhost-user ----->ovs------->xdp interface
Qemu sends udp packets, xdp has not packets to receive, but it must be polled
by ovs, and xdp must allocated/free mbuf unnecessary, with this packet, we has
about 5% benefit for sending, this depends on flow table complexity
When do rx benchmark, if packets per batch is reaching about 32, the benefit is
very little.
If packets per batch is far less than 32, we can see the cycle per packet is
reduced obviously
-Li