回复: [PATCH] hash: fix SSE comparison

Jieqiang Wang Fri, 06 Oct 2023 23:41:36 -0700

Thanks for your comments, Bruce!
A few comments inline.

BR,
Jieqiang Wang
-----邮件原件-----
发件人: Bruce Richardson <[email protected]>
发送时间: Monday, October 2, 2023 6:40 PM
收件人: Jieqiang Wang <[email protected]>
抄送: Yipeng Wang <[email protected]>; Sameh Gobriel 
<[email protected]>; Vladimir Medvedkin <[email protected]>; 
Honnappa Nagarahalli <[email protected]>; Dharmik Jayesh Thakkar 
<[email protected]>; [email protected]; nd <[email protected]>; 
[email protected]; Feifei Wang <[email protected]>; Ruifeng Wang 
<[email protected]>
主题: Re: [PATCH] hash: fix SSE comparison


On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements
> are equal. In original SSE2 implementation for function
> compare_signatures, it utilizes _mm_movemask_epi8 to create mask from
> the MSB of each 8-bit element, while we should only care about the MSB
> of lower 8-bit in each 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
>
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: [email protected]
> Cc: [email protected]
>
> Signed-off-by: Feifei Wang <[email protected]>
> Signed-off-by: Jieqiang Wang <[email protected]>
> Reviewed-by: Ruifeng Wang <[email protected]>

Fix looks correct, but see comment below. I think we can convert the vector 
mask to a simpler - and possibly faster - scalar one.

/Bruce

> ---
>  lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, 
> uint32_t *sec_hash_matches,
>       /* For match mask the first bit of every two bits indicates the match */
>       switch (sig_cmp_fn) {
>  #if defined(__SSE2__)
> -     case RTE_HASH_COMPARE_SSE:
> +     case RTE_HASH_COMPARE_SSE: {
>               /* Compare all signatures in the bucket */
> -             *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -                             _mm_load_si128(
> +             __m128i shift_mask = _mm_set1_epi16(0x0080);

Not sure that this variable name is the most descriptive, as we don't actually 
shift anything using this. How about "results_mask".

Ack.

> +             __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>                                       (__m128i const *)prim_bkt->sig_current),
> -                             _mm_set1_epi16(sig)));
> +                                     _mm_set1_epi16(sig));
> +             *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp,
> +shift_mask));

While this will work like you describe, I would think the simpler solution here 
is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be masked 
with compile-time constant 0xAAAA.

Bingo! That's indeed a better way to fix this issue. Just to confirm my 
understanding: we don't need to construct a vector mask to execute AND 
operation with the compared mask. Instead, we can AND the 
result(prim_hash_matches/sec_hash_matches) with a constant mask in the end. But 
It appears the correct constant should be 0x5555, not 0xAAAA, because we only 
care about the even-index bits based on the code logic of the default scalar 
path.

>               /* Compare all signatures in the bucket */
> -             *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -                             _mm_load_si128(
> +             __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>                                       (__m128i const *)sec_bkt->sig_current),
> -                             _mm_set1_epi16(sig)));
> +                                     _mm_set1_epi16(sig));
> +             *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, 
> shift_mask));
> +             }
>               break;
>  #elif defined(__ARM_NEON)
>       case RTE_HASH_COMPARE_NEON: {
> --
> 2.25.1
>
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

回复: [PATCH] hash: fix SSE comparison

Reply via email to