Hi > That means that each file with '#include <re_memcpy.h> will have its own > copy > of that function: > $ objdump -d x86_64-native-linuxapp-gcc/app/testpmd | grep > '<rte_memcpy_init>:' | sort -u | wc -l > 233 > Same story for rte_memcpy_ptr and rte_memcpy_DEFAULT, etc... > Obviously we need (and want) only one copy of that stuff per binary. > > > +#ifdef CC_SUPPORT_AVX2 > > Why do you assume this macro will be defined? > By whom? > There is no such macro with gcc: > $ gcc -march=native -dM -E - </dev/null 2>&1 | grep AVX2 > #define __AVX2__ 1 > , and you don't define it yourself. > When building with '-march=native' on BDW only rte_memcpy_DEFAULT get > compiled. >
I defined it myself. But when I sort the patch, I forgot to modify the file in this version. Sorry about that. It should be like this. To check whether the compiler supports AVX2 or AVX512. diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk index a813c91..92399ec 100644 --- a/mk/rte.cpuflags.mk +++ b/mk/rte.cpuflags.mk @@ -141,3 +141,17 @@ space:= $(empty) $(empty) CPUFLAGSTMP1 := $(addprefix RTE_CPUFLAG_,$(CPUFLAGS)) CPUFLAGSTMP2 := $(subst $(space),$(comma),$(CPUFLAGSTMP1)) CPUFLAGS_LIST := -DRTE_COMPILE_TIME_CPUFLAGS=$(CPUFLAGSTMP2) + +# Check if the compiler supports AVX512. +CC_SUPPORT_AVX512 := $(shell $(CC) -march=skylake-avx512 -dM -E - < /dev/null 2>&1 | grep -q AVX512 && echo 1) +ifeq ($(CC_SUPPORT_AVX512),1) +ifeq ($(CONFIG_RTE_ENABLE_AVX512),y) +MACHINE_CFLAGS += -DCC_SUPPORT_AVX512 +endif +endif + +# Check if the compiler supports AVX2. +CC_SUPPORT_AVX2 := $(shell $(CC) -march=core-avx2 -dM -E - < /dev/null 2>&1 | grep -q AVX2 && echo 1) +ifeq ($(CC_SUPPORT_AVX2),1) +MACHINE_CFLAGS += -DCC_SUPPORT_AVX2 +endif > To summarize: as I understand the goal of that patch was > (assuming that our current rte_memcpy() implementation is good in terms of > both performance and functionality): > 1. Based on current rte_memcpy() implementation define 3 x86 arch specific > rte_memcpy flavors: > a) rte_memcpy_SSE > b) rte_memcpy_AVX2 > c) rte_memcpy_AVX512 > 2. Select appropriate flavor based on current HW at runtime, > i.e. both 3 flavors should be present in the binary and selection should > be > made > at program startup. > > As I can see none of the goals was achieved with the current patch, > instead a lot of redundant code was introduced. > So I think it is NACK for the current version. > What I think need to be done instead: > > 1. mv lib/librte_eal/common/include/arch/x86/rte_memcpy.h > lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h > 2. inside rte_memcpy_internal.h rename rte_memcpy() into > rte_memcpy_internal(). > 3. create 3 files: > rte_memcpy_sse.c > rte_memcpy_avx2.c > rte_memcpy_avx512.c > > Inside each of these files we define corresponding rte_memcpy_xxx() > function. > I.E: > rte_memcpy_avx2.c: > .... > #ifndef RTE_MACHINE_CPUFLAG_AVX2 > #error "no avx2 support" > endif > > #include "rte_memcpy_internal.h" > ... > > void * > rte_memcpy_avx2(void *dst, const void *src, size_t n) > { > return rte_memcpy_internal(dst, src, n); > } > > 4. Make changes inside lib/librte_eal/Makefile to ensure that each of > rte_memcpy_xxx() > get build with appropriate -march flags (I.E: avx2 with -mavx2, etc.) > You can use librte_acl/Makefile as a reference. > > 5. Create rte_memcpy.c and put rte_memcpy_ptr/rte_memcpy_init() > definitions in that file. > 6. Create new rte_memcpy.h and define rte_memcpy() in it: > > ... > #include <rte_memcpy_internal.h> > ... > > +#define RTE_X86_MEMCPY_THRESH 128 > static inline void * > rte_memcpy(void *dst, const void *src, size_t n) > { > if (n <= RTE_X86_MEMCPY_THRESH) > return rte_memcpy_internal(dst, src, n); > else > return (*rte_memcpy_ptr)(dst, src, n); > } > > 7. Test it properly - i.e. build dpdk with default target and make sure each > of > 3 flavors > could be selected properly at runtime based on underlying arch. > > 8. As a possible future improvement - with such changes we don't need a > generic inline > implementation. We can think about creating a faster version that need to > copy > <= 128B. > > Konstantin > Will modify it in next version. Thanks.