On Fri, Jun 10, 2022 at 07:35:44AM -0500, Timothy McDaniel wrote: > On Xeon, 512b accesses are available, so movdir64 instruction is able to > perform 512b read and write to DLB producer port. In order for movdir64 > to be able to pull its data from store buffers (store-buffer-forwarding) > (before actual write), data should be in single 512b write format. > This commit add change when code is built for Xeon with 512b AVX support > to make single 512b write of all 4 QEs instead of 4x64b writes. > > Signed-off-by: Timothy McDaniel <timothy.mcdan...@intel.com> > Acked-by: Kent Wires <kent.wi...@intel.com> > === > > Changes since V4: > 1) Add build-time control for avx512 support to meson.buildi, based > on implementation found in lib/acl/meson.build > 2) Add rte_vect_get_max_simd_bitwidth runtime check before using > avx512 instructions >
Thanks, these changes look better for runtime support. Some further more minor comments inline below. /Bruce > Changes since V3: > 1) Renamed dlb2_noavx512.c to dlb2_sve.c, and fixed up meson.build > for new file name. > > Changes since V1: > 1) Split out dlb2_event_build_hcws into two implementations, one > that uses AVX512 instructions, and one that does not. Each implementation > is in its own source file in order to avoid build errors if the compiler > does not support the newer AVX512 instructions. > 2) Update meson.build to and pull in appropriate source file based on > whether the compiler supports AVX512VL > 3) Check if target supports AVX512VL, and use appropriate implementation > based on this runtime check. > --- > drivers/event/dlb2/dlb2.c | 208 +----------------------- > drivers/event/dlb2/dlb2_avx512.c | 267 +++++++++++++++++++++++++++++++ > drivers/event/dlb2/dlb2_priv.h | 10 ++ > drivers/event/dlb2/dlb2_sve.c | 219 +++++++++++++++++++++++++ > drivers/event/dlb2/meson.build | 53 ++++++ > 5 files changed, 556 insertions(+), 201 deletions(-) > create mode 100644 drivers/event/dlb2/dlb2_avx512.c > create mode 100644 drivers/event/dlb2/dlb2_sve.c > <snip> > diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.build > index f963589fd3..58146e8aef 100644 > --- a/drivers/event/dlb2/meson.build > +++ b/drivers/event/dlb2/meson.build > @@ -19,6 +19,59 @@ sources = files( > 'dlb2_selftest.c', > ) > > +# compile AVX512 version if: > +# we are building 64-bit binary (checked above) AND binutils > +# can generate proper code > + > +if binutils_ok > + > + # compile AVX512 version if either: > + # a. we have AVX512 supported in minimum instruction set > + # baseline > + # b. it's not minimum instruction set, but supported by > + # compiler > + # > + # in former case, just add avx512 C file to files list > + # in latter case, compile c file to static lib, using correct > + # compiler flags, and then have the .o file from static lib > + # linked into main lib. > + > + # check if all required flags already enabled (variant a). > + dlb2_avx512_flags = ['__AVX512F__', '__AVX512VL__', > + '__AVX512CD__', '__AVX512BW__'] Minor nit: are all 4 of these really necessary? I see the runtime portion only seems to check for VL? > + > + dlb2_avx512_on = true > + foreach f:dlb2_avx512_flags > + > + if cc.get_define(f, args: machine_args) == '' > + dlb2_avx512_on = false > + endif > + endforeach > + > + if dlb2_avx512_on == true > + > + sources += files('dlb2_avx512.c') > + cflags += '-DCC_AVX512_SUPPORT' > + > + elif cc.has_multi_arguments('-mavx512f', '-mavx512vl', > + '-mavx512cd', '-mavx512bw') > + > + cflags += '-DCC_AVX512_SUPPORT' > + avx512_tmplib = static_library('avx512_tmp', > + 'dlb2_avx512.c', > + dependencies: [static_rte_eal, > + static_rte_eventdev], > + c_args: cflags + > + ['-mavx512f', '-mavx512vl', > + '-mavx512cd', '-mavx512bw']) > + objs += avx512_tmplib.extract_objects('dlb2_avx512.c') > + else > + sources += files('dlb2_sve.c') > + endif > +else > + sources += files('dlb2_sve.c') Since this is x86 only, do you mean SSE rather than SVE? Also, rather than adding this in the "else" legs, does the SSE version not need to always be compiled in? If the build takes the second leg, i.e. build is not mandating AVX-512, but supports it if not available, is the SSE code path not necessary for the case where the runtime machine does not support AVX-512? > +endif > + > headers = files('rte_pmd_dlb2.h') > > deps += ['mbuf', 'mempool', 'ring', 'pci', 'bus_pci'] > -- > 2.25.1