On Thu, Mar 27, 2025 at 04:53:27AM -0500, tirthendu.sarkar wrote: > Streamline code for AVX512 and SSE by removing separate files for them > and adding runtime checks for selecting appropriate paths based on CPU > capability. Also, update meson build file. > > Signed-off-by: Tirthendu Sarkar <tirthendu.sar...@intel.com> > --- > drivers/event/dlb2/dlb2.c | 248 +++++++++++++++++++++++++++ > drivers/event/dlb2/dlb2_avx512.c | 276 ------------------------------- > drivers/event/dlb2/dlb2_priv.h | 6 - > drivers/event/dlb2/dlb2_sse.c | 232 -------------------------- > drivers/event/dlb2/meson.build | 3 - > 5 files changed, 248 insertions(+), 517 deletions(-) > delete mode 100644 drivers/event/dlb2/dlb2_avx512.c > delete mode 100644 drivers/event/dlb2/dlb2_sse.c > > diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c > index 934fcafcfe..ac0eb3fb24 100644 > --- a/drivers/event/dlb2/dlb2.c > +++ b/drivers/event/dlb2/dlb2.c > @@ -2669,6 +2669,21 @@ dlb2_eventdev_start(struct rte_eventdev *dev) > return 0; > } >
<snip> > + */ > +#define DLB2_QE_EV_TYPE_WORD 0 > + sse_qe[0] = _mm_insert_epi16(sse_qe[0], > + ev[0].sub_event_type << 4 | > + ev[0].event_type << 12, > + DLB2_QE_EV_TYPE_WORD); > + sse_qe[0] = _mm_insert_epi16(sse_qe[0], > + ev[1].sub_event_type << 4 | > + ev[1].event_type << 12, > + DLB2_QE_EV_TYPE_WORD + 4); > + sse_qe[1] = _mm_insert_epi16(sse_qe[1], > + ev[2].sub_event_type << 4 | > + ev[2].event_type << 12, > + DLB2_QE_EV_TYPE_WORD); > + sse_qe[1] = _mm_insert_epi16(sse_qe[1], > + ev[3].sub_event_type << 4 | > + ev[3].event_type << 12, > + DLB2_QE_EV_TYPE_WORD + 4); > +#ifdef __AVX512VL__ > + if (qm_port->use_avx512) { > + /* Hi Tirthendu, This runtime detection is not really correct, and not fully completely, for a number of reasons: * if doing a "default" or generic build, __AVX512VL__ flag will not be defined at build time, unless you explicitly add the AVX512 flags. Therefore, unlike other drivers, we won't have a generic build that can opportunistically use accelerated code paths at runtime. * On the flip side, if we do have a build with AVX512 enabled, then AVX code can (and possibly will) be used by the compiler anywhere in the output code. Therefore, the "SSE" code paths outside this ifdefs may well be converted to the compiler into AVX equivalents, and as such may not run on a non-AVX capable machine. To have proper runtime support, you need to build both SSE and AVX512 paths separately with different compiler flags, and then do a runtime check to choose between them, using a function pointer or similar. Regards, /Bruce