On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote: > __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are > equal. In original SSE2 implementation for function compare_signatures, > it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit > element, while we should only care about the MSB of lower 8-bit in each > 16-bit element. > For example, if the comparison result is all equal, SSE2 path returns > 0xFFFF while NEON and default scalar path return 0x5555. > Although this bug is not causing any negative effects since the caller > function solely examines the trailing zeros of each match mask, we > recommend this fix to ensure consistency with NEON and default scalar > code behaviors. > > Fixes: c7d93df552c2 ("hash: use partial-key hashing") > Cc: yipeng1.w...@intel.com > Cc: sta...@dpdk.org > > Signed-off-by: Feifei Wang <feifei.wa...@arm.com> > Signed-off-by: Jieqiang Wang <jieqiang.w...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
Fix looks correct, but see comment below. I think we can convert the vector mask to a simpler - and possibly faster - scalar one. /Bruce > --- > lib/hash/rte_cuckoo_hash.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c > index d92a903bb3..acaa8b74bd 100644 > --- a/lib/hash/rte_cuckoo_hash.c > +++ b/lib/hash/rte_cuckoo_hash.c > @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, > uint32_t *sec_hash_matches, > /* For match mask the first bit of every two bits indicates the match */ > switch (sig_cmp_fn) { > #if defined(__SSE2__) > - case RTE_HASH_COMPARE_SSE: > + case RTE_HASH_COMPARE_SSE: { > /* Compare all signatures in the bucket */ > - *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( > - _mm_load_si128( > + __m128i shift_mask = _mm_set1_epi16(0x0080); Not sure that this variable name is the most descriptive, as we don't actually shift anything using this. How about "results_mask". > + __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128( > (__m128i const *)prim_bkt->sig_current), > - _mm_set1_epi16(sig))); > + _mm_set1_epi16(sig)); > + *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, > shift_mask)); While this will work like you describe, I would think the simpler solution here is not to do a vector mask, but instead to simply do a scalar one. This would save extra vector loads too, since all values could just be masked with compile-time constant 0xAAAA. > /* Compare all signatures in the bucket */ > - *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( > - _mm_load_si128( > + __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128( > (__m128i const *)sec_bkt->sig_current), > - _mm_set1_epi16(sig))); > + _mm_set1_epi16(sig)); > + *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, > shift_mask)); > + } > break; > #elif defined(__ARM_NEON) > case RTE_HASH_COMPARE_NEON: { > -- > 2.25.1 >