On Sat, 2 Mar 2024 13:01:51 +0100 Mattias Rönnblom <hof...@lysator.liu.se> wrote:
> I ran some DSW benchmarks, and if you add > > diff --git a/lib/eal/x86/include/rte_memcpy.h > b/lib/eal/x86/include/rte_memcpy.h > index 72a92290e0..64cd82d78d 100644 > --- a/lib/eal/x86/include/rte_memcpy.h > +++ b/lib/eal/x86/include/rte_memcpy.h > @@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src, > size_t n) > static __rte_always_inline void * > rte_memcpy(void *dst, const void *src, size_t n) > { > + if (__builtin_constant_p(n) && n <= 32) { > + memcpy(dst, src, n); > + return dst; > + } > + The default GCC inline threshold is 64 bytes (ie cache line size) and that makes sense. Since using __builtin_constant_p could do: if (__builtin_constant_p(p) && n < RTE_CACHE_LINE_SIZE) return __builtin_memcpy(dst, src, n);