05/07/2024 19:43, Yoan Picchi: > On 7/4/24 21:31, David Marchand wrote: > > On Wed, Jul 3, 2024 at 7:13 PM Yoan Picchi <yoan.pic...@arm.com> wrote: > >> --- /dev/null > >> +++ b/lib/hash/compare_signatures_arm_pvt.h > > > > I guess pvt stands for private. > > No need for such suffix, this header won't be exported in any case. > > pvt do stand for private, yes. I had a look at the other lib and what > they used to state a header as private. Several (rcu, ring and stack) > use _pvt so it looks like that's might be the standard? If no, then how > am I supposed to differentiate a public and a private header?
Public headers are prefixed with rte_ We should not use _pvt > >> +#if RTE_HASH_BUCKET_ENTRIES <= 8 > >> + case RTE_HASH_COMPARE_NEON: { > >> + uint16x8_t vmat, vsig, x; > >> + int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7}; > >> + uint16_t low, high; > >> + > >> + vsig = vld1q_dup_u16((uint16_t const *)&sig); > >> + /* Compare all signatures in the primary bucket */ > >> + vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const > >> *)prim_bucket_sigs)); > >> + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); > >> + low = (uint16_t)(vaddvq_u16(x)); > >> + /* Compare all signatures in the secondary bucket */ > >> + vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const > >> *)sec_bucket_sigs)); > >> + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); > >> + high = (uint16_t)(vaddvq_u16(x)); > >> + *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES; > >> + > >> + } > >> + break; > >> +#endif > >> + default: > >> + for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) > >> { > >> + *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << > >> i; > >> + *hitmask_buffer |= > >> + ((sig == sec_bucket_sigs[i]) << i) << > >> RTE_HASH_BUCKET_ENTRIES; > >> + } > >> + } > >> +} > > > > IIRC, this code is copied in all three headers. > > It is a common scalar version, so the ARM code could simply call the > > "generic" implementation rather than copy/paste. > > Out of the three files, only two versions are the same: generic and arm. > Intel's version do have some padding added (given it's sparse). > I prefer to keep a scalar version in the arm implementation because > that's what match the legacy implementation. We used to be able to > choose (at runtime) to use the scalar path even if we had neon. In > practice the choice ends up being made from #defines, but as far as this > function goes, it is a runtime decision. I have no strong opinion.