On Tue, Oct 26, 2021 at 9:44 PM Thomas Monjalon <tho...@monjalon.net> wrote:

> 26/10/2021 17:56, Aman Kumar:
> > This patch provides a rte_memcpy* call with temporal stores.
> > Use -Dcpu_instruction_set=znverX with build to enable this API.
> >
> > Signed-off-by: Aman Kumar <aman.ku...@vvdntech.in>
> > ---
> >  config/x86/meson.build           |   2 +
> >  lib/eal/x86/include/rte_memcpy.h | 114 +++++++++++++++++++++++++++++++
>
> It looks better as C code.
> Do you achieve the same performance as the asm version?
>

In a few corner cases assembly performed better, but overall we have very
similar perf observations.

> > +#if defined RTE_MEMCPY_AMDEPYC
> [...]
> > +static __rte_always_inline void *
> > +rte_memcpy_aligned_tstore16_generic(void *dst, void *src, int len)
>
> So to be clear, an application will benefit of this optimization if
> 1/ DPDK is specifically compiled for AMD
> 2/ the application is compiled with above DPDK build (because of
> inlinining)
>
> I guess there is no good way to benefit from the optimization
> without specific compilation, because of inlining constraint.
> Another design, with less constraint but less performance,
> would be to have a function pointer assigned at runtime based on the CPU.
>

You're right. We need to build DPDK and apps with this flag enabled to get
the benefit.
In future versions, we will try to adapt in a more dynamic way. Thanks.

Reply via email to