On Sat, May 14, 2022 at 05:37:39PM +0530, Jerin Jacob wrote: > On Sat, Apr 9, 2022 at 8:48 PM Timothy McDaniel > <timothy.mcdan...@intel.com> wrote: > > > > On Xeon, as 512b accesses are available, movdir64 instruction is able > > to perform 512b read and write to DLB producer port. In order for > > movdir64 to be able to pull its data from store buffers > > (store-buffer-forwarding) (before actual write), data should be in > > single 512b write format. This commit add change when code is built > > for Xeon with 512b AVX support to make single 512b write of all 4 QEs > > instead of 4x64b writes. > > > > Signed-off-by: Timothy McDaniel <timothy.mcdan...@intel.com> --- > > drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++--------- > > 1 file changed, 67 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c > > index 36f07d0061..e2a5303310 100644 --- a/drivers/event/dlb2/dlb2.c +++ > > b/drivers/event/dlb2/dlb2.c @@ -2776,25 +2776,73 @@ > > dlb2_event_build_hcws(struct dlb2_port *qm_port, ev[3].event_type, > > DLB2_QE_EV_TYPE_WORD + 4); > > > > - /* Store the metadata to memory (use the > > double-precision - * _mm_storeh_pd because there is no > > integer function for - * storing the upper 64b): - > > * qe[0] metadata = sse_qe[0][63:0] - * qe[1] metadata = > > sse_qe[0][127:64] - * qe[2] metadata = sse_qe[1][63:0] - > > * qe[3] metadata = sse_qe[1][127:64] - */ - > > _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]); - > > _mm_storeh_pd((double *)&qe[1].u.opaque_data, - > > (__m128d)sse_qe[0]); - _mm_storel_epi64((__m128i > > *)&qe[2].u.opaque_data, sse_qe[1]); - > > _mm_storeh_pd((double *)&qe[3].u.opaque_data, - > > (__m128d)sse_qe[1]); - - qe[0].data = ev[0].u64; - > > qe[1].data = ev[1].u64; - qe[2].data = ev[2].u64; - > > qe[3].data = ev[3].u64; + #ifdef __AVX512VL__ > > + x86 maintainers > > We need a runtime check based on CPU flags. Right? As the build and run > machine can be different? > Ideally, yes, this should be a run-time decision. There are quite a number of examples of this in DPDK. However, most uses of runtime decisions are in functions called via function pointer, so not sure if those schemes apply here. It's certainly worth investigating, though.
/Bruce