On Fri, May 20, 2022 at 1:31 AM Timothy McDaniel <timothy.mcdan...@intel.com> wrote: > > On Xeon, as 512b accesses are available, movdir64 instruction is able to > perform 512b read and write to DLB producer port. In order for movdir64 > to be able to pull its data from store buffers (store-buffer-forwarding) > (before actual write), data should be in single 512b write format. > This commit add change when code is built for Xeon with 512b AVX support > to make single 512b write of all 4 QEs instead of 4x64b writes. > > Signed-off-by: Timothy McDaniel <timothy.mcdan...@intel.com> > Acked-by: Kent Wires <kent.wi...@intel.com> > === > > Changes since V1: > 1) Split out dlb2_event_build_hcws into two implementations, one > that uses AVX512 instructions, and one that does not. Each implementation > is in its own source file in order to avoid build errors if the compiler > does not support the newer AVX512 instructions. > 2) Update meson.build to and pull in appropriate source file based on > whether the compiler supports AVX512VL > 3) Check if target supports AVX512VL, and use appropriate implementation > based on this runtime check. > --- > drivers/event/dlb2/dlb2.c | 206 +--------------------- > drivers/event/dlb2/dlb2_avx512.c | 267 +++++++++++++++++++++++++++++ > drivers/event/dlb2/dlb2_noavx512.c | 219 +++++++++++++++++++++++
Could you change the file name to dlb2_sve.c as noavx512 means it can be NEON too. Rest looks good to me. Will merge the next version.