On Sat, 2 Mar 2024 13:01:51 +0100
Mattias Rönnblom <hof...@lysator.liu.se> wrote:
> I ran some DSW benchmarks, and if you add
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h 
> b/lib/eal/x86/include/rte_memcpy.h
> index 72a92290e0..64cd82d78d 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src, 
> size_t n)
>   static __rte_always_inline void *
>   rte_memcpy(void *dst, const void *src, size_t n)
>   {
> +       if (__builtin_constant_p(n) && n <= 32) {
> +               memcpy(dst, src, n);
> +               return dst;
> +       }
> +

The default GCC inline threshold is 64 bytes (ie cache line size)
and that makes sense.  Since using __builtin_constant_p could
do:
        if (__builtin_constant_p(p) && n < RTE_CACHE_LINE_SIZE)
                return __builtin_memcpy(dst, src, n);

Reply via email to