Hi Yoan, On 2024/3/12 7:21, Yoan Picchi wrote: > - Implemented SVE code for comparing signatures in bulk lookup. > - Added Defines in code for SVE code support. > - Optimise NEON code
This commit does not include this part. Pls only describe the content in this commit. > - New SVE code is ~5% slower than optimized NEON for N2 processor. > > Signed-off-by: Yoan Picchi <yoan.pic...@arm.com> > Signed-off-by: Harjot Singh <harjot.si...@arm.com> > Reviewed-by: Nathan Brown <nathan.br...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > --- > lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++ > lib/hash/rte_cuckoo_hash.c | 2 + > 2 files changed, 60 insertions(+) > > diff --git a/lib/hash/arch/arm/compare_signatures.h > b/lib/hash/arch/arm/compare_signatures.h > index b5a457f936..8a0627e119 100644 > --- a/lib/hash/arch/arm/compare_signatures.h > +++ b/lib/hash/arch/arm/compare_signatures.h > @@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer, > *hitmask_buffer = vaddvq_u16(hit2); > } > break; > +#endif > +#if defined(RTE_HAS_SVE_ACLE) > + case RTE_HASH_COMPARE_SVE: { ... > #endif > default: > for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c > index e41f03270a..7a474267f0 100644 > --- a/lib/hash/rte_cuckoo_hash.c > +++ b/lib/hash/rte_cuckoo_hash.c > @@ -452,6 +452,8 @@ rte_hash_create(const struct rte_hash_parameters *params) > #elif defined(RTE_ARCH_ARM64) > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { > h->sig_cmp_fn = RTE_HASH_COMPARE_NEON; > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE)) > + h->sig_cmp_fn = RTE_HASH_COMPARE_SVE; The RTE_HASH_COMPARE_SVE was defined in "PATCH v6 1/4] hash: pack the hitmask for hash in bulk lookup", but its first use is in this commit, so I think it should defined in this commit. If RTE_CPUFLAG_SVE and RTE_HAS_SVE_ACLE both set, then SVE impl will be chosen. If RTE_CPUFLAG_SVE defined, but RTE_HAS_SVE_ACLE was not, then scalar will be chosen. --- in this case we could back to NEON impl. So I suggest direct use "#if defined(RTE_HAS_SVE_ACLE)" here. > } > else > #endif > Plus: I notice the commit log said the SVE performance is slower than NEON. And I also notice other platform SVE also lower than NEON, 1. b4ee9c07bd config/arm: disable SVE ACLE for CN10K 2. 4eea7c6461 config/arm: add SVE ACLE control flag So maybe we should disable RTE_HAS_SVE_ACLE default by: diff --git a/config/arm/meson.build b/config/arm/meson.build index 9d6fb87d7f..a5b890d100 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -875,7 +875,7 @@ endif if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != '' compile_time_cpuflags += ['RTE_CPUFLAG_SVE'] - if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', true)) + if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', false)) dpdk_conf.set('RTE_HAS_SVE_ACLE', 1) endif endif If the platform verify SVE has higher performance, then it could enable SVE by add "sve_acle: true" in soc_xxx config. Thanks