> -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Saturday, May 14, 2022 7:08 AM > To: McDaniel, Timothy <timothy.mcdan...@intel.com>; Richardson, Bruce > <bruce.richard...@intel.com>; konstantin.v.anan...@yandex.ru > Cc: Jerin Jacob <jer...@marvell.com>; dpdk-dev <dev@dpdk.org> > Subject: Re: [PATCH] event/dlb2: add support for single 512B write of 4 QEs > > On Sat, Apr 9, 2022 at 8:48 PM Timothy McDaniel > <timothy.mcdan...@intel.com> wrote: > > > > On Xeon, as 512b accesses are available, movdir64 instruction is able to > > perform 512b read and write to DLB producer port. In order for movdir64 > > to be able to pull its data from store buffers (store-buffer-forwarding) > > (before actual write), data should be in single 512b write format. > > This commit add change when code is built for Xeon with 512b AVX support > > to make single 512b write of all 4 QEs instead of 4x64b writes. > > > > Signed-off-by: Timothy McDaniel <timothy.mcdan...@intel.com> > > --- > > drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++--------- > > 1 file changed, 67 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c > > index 36f07d0061..e2a5303310 100644 > > --- a/drivers/event/dlb2/dlb2.c > > +++ b/drivers/event/dlb2/dlb2.c > > @@ -2776,25 +2776,73 @@ dlb2_event_build_hcws(struct dlb2_port > *qm_port, > > ev[3].event_type, > > DLB2_QE_EV_TYPE_WORD + 4); > > > > - /* Store the metadata to memory (use the double-precision > > - * _mm_storeh_pd because there is no integer function for > > - * storing the upper 64b): > > - * qe[0] metadata = sse_qe[0][63:0] > > - * qe[1] metadata = sse_qe[0][127:64] > > - * qe[2] metadata = sse_qe[1][63:0] > > - * qe[3] metadata = sse_qe[1][127:64] > > - */ > > - _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, > > sse_qe[0]); > > - _mm_storeh_pd((double *)&qe[1].u.opaque_data, > > - (__m128d)sse_qe[0]); > > - _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, > > sse_qe[1]); > > - _mm_storeh_pd((double *)&qe[3].u.opaque_data, > > - (__m128d)sse_qe[1]); > > - > > - qe[0].data = ev[0].u64; > > - qe[1].data = ev[1].u64; > > - qe[2].data = ev[2].u64; > > - qe[3].data = ev[3].u64; > > + #ifdef __AVX512VL__ > > + x86 maintainers > > We need a runtime check based on CPU flags. Right? As the build and > run machine can be different?
Thanks Jerin. I will convert to a runtime check.