[dpdk-dev] [PATCH 2/3] hash: add vectorized comparison

Thomas Monjalon Sat, 27 Aug 2016 10:57:47 +0200

2016-08-26 22:34, Pablo de Lara:
> From: Byron Marohn <byron.marohn at intel.com>
> 
> In lookup bulk function, the signatures of all entries
> are compared against the signature of the key that is being looked up.
> Now that all the signatures are together, they can be compared
> with vector instructions (SSE, AVX2), achieving higher lookup performance.
> 
> Also, entries per bucket are increased to 8 when using processors
> with AVX2, as 256 bits can be compared at once, which is the size of
> 8x32-bit signatures.


Please, would it be possible to use the generic SIMD intrinsics?
We could define generic types compatible with Altivec and NEON:
        __attribute__ ((vector_size (n)))
as described in https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

> +/* 8 entries per bucket */
> +#if defined(__AVX2__)

Please prefer
        #ifdef RTE_MACHINE_CPUFLAG_AVX2
Ideally the vector support could be checked at runtime:
        if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
It would allow packaging one binary using the best optimization available.

> +     *prim_hash_matches |= _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> +                     _mm256_load_si256((__m256i const 
> *)prim_bkt->sig_current),
> +                     _mm256_set1_epi32(prim_hash)));
> +     *sec_hash_matches |= _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> +                     _mm256_load_si256((__m256i const 
> *)sec_bkt->sig_current),
> +                     _mm256_set1_epi32(sec_hash)));
> +/* 4 entries per bucket */
> +#elif defined(__SSE2__)
> +     *prim_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +                     _mm_load_si128((__m128i const *)prim_bkt->sig_current),
> +                     _mm_set1_epi32(prim_hash)));
> +     *sec_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +                     _mm_load_si128((__m128i const *)sec_bkt->sig_current),
> +                     _mm_set1_epi32(sec_hash)));

In order to allow such switch based on register size, we could have an
abstraction in EAL supporting 128/256/512 width for x86/ARM/POWER.
I think aliasing RTE_MACHINE_CPUFLAG_ and RTE_CPUFLAG_ may be enough.

[dpdk-dev] [PATCH 2/3] hash: add vectorized comparison

Reply via email to