On Sun, Jan 15, 2017 at 12:04:31PM +0000, Pablo de Lara wrote: > Elastic Flow Distributor (EFD) is a distributor library that uses > perfect hashing to determine a target/value for a given incoming flow key. > It has the following advantages: > > - First, because it uses perfect hashing, it does not store > the key itself and hence lookup performance is not dependent > on the key size. > > - Second, the target/value can be any arbitrary value hence > the system designer and/or operator can better optimize service rates > and inter-cluster network traffic locating. > > - Third, since the storage requirement is much smaller than a hash-based > flow table (i.e. better fit for CPU cache), EFD can scale to > millions of flow keys. > Finally, with current optimized library implementation performance > is fully scalable with number of CPU cores. > > Signed-off-by: Byron Marohn <byron.mar...@intel.com> > Signed-off-by: Pablo de Lara <pablo.de.lara.gua...@intel.com> > Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuga...@intel.com> > Acked-by: Christian Maciocco <christian.macio...@intel.com> > --- > +#if (RTE_EFD_VALUE_NUM_BITS == 8 || RTE_EFD_VALUE_NUM_BITS == 16 || \ > + RTE_EFD_VALUE_NUM_BITS == 24 || RTE_EFD_VALUE_NUM_BITS == 32) > +#define EFD_LOAD_SI128(val) _mm_load_si128(val) > +#else > +#define EFD_LOAD_SI128(val) _mm_lddqu_si128(val) > +#endif > + > +static inline efd_value_t > +efd_lookup_internal(const struct efd_online_group_entry * const group, > + const uint32_t hash_val_a, const uint32_t hash_val_b, > + enum rte_efd_compare_function cmp_fn) > +{ > + efd_value_t value = 0; > + uint32_t i; > + > + switch (cmp_fn) { > +#ifdef RTE_MACHINE_CPUFLAG_AVX2 > + case RTE_HASH_COMPARE_AVX2: > + > + i = 0; > + __m256i vhash_val_a = _mm256_set1_epi32(hash_val_a); > + __m256i vhash_val_b = _mm256_set1_epi32(hash_val_b); > +
Could you please abstract and move SIMD specific code to another file like other libraries(example: lib_acl) to enable smooth integration with neon and altivec SIMD implementations in future. > + for (; i < RTE_EFD_VALUE_NUM_BITS; i += 8) { > + __m256i vhash_idx = > + _mm256_cvtepu16_epi32(EFD_LOAD_SI128( > + (__m128i const *) &group->hash_idx[i])); > + __m256i vlookup_table = _mm256_cvtepu16_epi32( > + EFD_LOAD_SI128((__m128i const *) > + &group->lookup_table[i])); > + __m256i vhash = _mm256_add_epi32(vhash_val_a, > + _mm256_mullo_epi32(vhash_idx, > vhash_val_b)); > + __m256i vbucket_idx = _mm256_srli_epi32(vhash, > + EFD_LOOKUPTBL_SHIFT); > + __m256i vresult = _mm256_srlv_epi32(vlookup_table, > + vbucket_idx); > + > + value |= (_mm256_movemask_ps( > + (__m256) _mm256_slli_epi32(vresult, 31)) > + & ((1 << (RTE_EFD_VALUE_NUM_BITS - i)) - 1)) << > i; > + } > + break; > +#endif