Alexander Lobakin wrote: > From: Willem De Bruijn <willemdebruijn.ker...@gmail.com> > Date: Mon, 17 Jun 2024 14:13:07 -0400 > > > Alexander Lobakin wrote: > >> From: Willem De Bruijn <willemdebruijn.ker...@gmail.com> > >> Date: Thu, 30 May 2024 09:46:46 -0400 > >> > >>> Alexander Lobakin wrote: > >>>> Currently, idpf uses the following model for the header buffers: > >>>> > >>>> * buffers are allocated via dma_alloc_coherent(); > >>>> * when receiving, napi_alloc_skb() is called and then the header is > >>>> copied to the newly allocated linear part. > >>>> > >>>> This is far from optimal as DMA coherent zone is slow on many systems > >>>> and memcpy() neutralizes the idea and benefits of the header split. > >>> > >>> In the previous revision this assertion was called out, as we have > >>> lots of experience with the existing implementation and a previous one > >>> based on dynamic allocation one that performed much worse. You would > >> > >> napi_build_skb() is not a dynamic allocation. In contrary, > >> napi_alloc_skb() from the current implementation actually *is* a dynamic > >> allocation. It allocates a page frag for every header buffer each time. > >> > >> Page Pool refills header buffers from its pool of recycled frags. > >> Plus, on x86_64, truesize of a header buffer is 1024, meaning it picks > >> a new page from the pool every 4th buffer. During the testing of common > >> workloads, I had literally zero new page allocations, as the skb core > >> recycles frags from skbs back to the pool. > >> > >> IOW, the current version you're defending actually performs more dynamic > >> allocations on hotpath than this one ¯\_(ツ)_/¯ > >> > >> (I explained all this several times already) > >> > >>> share performance numbers in the next revision > >> > >> I can't share numbers in the outside, only percents. > >> > >> I shared before/after % in the cover letter. Every test yielded more > >> Mpps after this change, esp. non-XDP_PASS ones when you don't have > >> networking stack overhead. > > > > This is the main concern: AF_XDP has no existing users, but TCP/IP is > > used in production environments. So we cannot risk TCP/IP regressions > > in favor of somewhat faster AF_XDP. Secondary is that a functional > > implementation of AF_XDP soon with optimizations later is preferable > > over the fastest solution later. > > I have perf numbers before-after for all the common workloads and I see > only improvements there.
Good. That was the request. Not only from me, to remind. > Do you have any to prove that this change > introduces regressions? I have no data yet. We can run some tests on your github series too.