Hi, 13/10/2017 11:01, Xiaoyun Li: > This patch dynamically selects functions of memcpy at run-time based > on CPU flags that current machine supports. This patch uses function > pointers which are bind to the relative functions at constrctor time. > In addition, AVX512 instructions set would be compiled only if users > config it enabled and the compiler supports it. > > Signed-off-by: Xiaoyun Li <xiaoyun...@intel.com> > --- Keeping only the major changes of the patch for later discussions: [...] > static inline void * > rte_memcpy(void *dst, const void *src, size_t n) > { > - if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) > - return rte_memcpy_aligned(dst, src, n); > + if (n <= RTE_X86_MEMCPY_THRESH) > + return rte_memcpy_internal(dst, src, n); > else > - return rte_memcpy_generic(dst, src, n); > + return (*rte_memcpy_ptr)(dst, src, n); > } [...] > +static inline void * > +rte_memcpy_internal(void *dst, const void *src, size_t n) > +{ > + if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) > + return rte_memcpy_aligned(dst, src, n); > + else > + return rte_memcpy_generic(dst, src, n); > +}
The significant change of this patch is to call a function pointer for packet size > 128 (RTE_X86_MEMCPY_THRESH). Please could you provide some benchmark numbers? >From a test done at Mellanox, there might be a performance degradation of about 15% in testpmd txonly with AVX2. Is there someone else seeing a performance degradation?