On 3/12/24 03:57, fengchengwen wrote:
Hi Yoan,
On 2024/3/12 7:21, Yoan Picchi wrote:
- Implemented SVE code for comparing signatures in bulk lookup.
- Added Defines in code for SVE code support.
- Optimise NEON code
This commit does not include this part. Pls only describe the content in this
commit.
Thank you. I forgot to edit that out after moving commit around.
- New SVE code is ~5% slower than optimized NEON for N2 processor.
Signed-off-by: Yoan Picchi <yoan.pic...@arm.com>
Signed-off-by: Harjot Singh <harjot.si...@arm.com>
Reviewed-by: Nathan Brown <nathan.br...@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
---
lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++
lib/hash/rte_cuckoo_hash.c | 2 +
2 files changed, 60 insertions(+)
diff --git a/lib/hash/arch/arm/compare_signatures.h
b/lib/hash/arch/arm/compare_signatures.h
index b5a457f936..8a0627e119 100644
--- a/lib/hash/arch/arm/compare_signatures.h
+++ b/lib/hash/arch/arm/compare_signatures.h
@@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
*hitmask_buffer = vaddvq_u16(hit2);
}
break;
+#endif
+#if defined(RTE_HAS_SVE_ACLE)
+ case RTE_HASH_COMPARE_SVE: {
...
#endif
default:
for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index e41f03270a..7a474267f0 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -452,6 +452,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
#elif defined(RTE_ARCH_ARM64)
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
+ if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
+ h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
The RTE_HASH_COMPARE_SVE was defined in "PATCH v6 1/4] hash: pack the hitmask for
hash in bulk lookup",
but its first use is in this commit, so I think it should defined in this
commit.
If RTE_CPUFLAG_SVE and RTE_HAS_SVE_ACLE both set, then SVE impl will be chosen.
If RTE_CPUFLAG_SVE defined, but RTE_HAS_SVE_ACLE was not, then scalar will be
chosen. --- in this case we could back to NEON impl.
So I suggest direct use "#if defined(RTE_HAS_SVE_ACLE)" here.
Sounds fair. I'll do it.
}
else
#endif
Plus:
I notice the commit log said the SVE performance is slower than NEON.
And I also notice other platform SVE also lower than NEON,
1. b4ee9c07bd config/arm: disable SVE ACLE for CN10K
2. 4eea7c6461 config/arm: add SVE ACLE control flag
So maybe we should disable RTE_HAS_SVE_ACLE default by:
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 9d6fb87d7f..a5b890d100 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -875,7 +875,7 @@ endif
if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
compile_time_cpuflags += ['RTE_CPUFLAG_SVE']
- if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', true))
+ if (cc.check_header('arm_sve.h') and soc_config.get('sve_acle', false))
dpdk_conf.set('RTE_HAS_SVE_ACLE', 1)
endif
endif
If the platform verify SVE has higher performance, then it could enable SVE by add
"sve_acle: true" in soc_xxx config.
Thanks
Here I kinda disagree. In this particular instance, SVE is a bit slower
with narrow vectors (128b), but could be faster with some wider vector
sizes.
Even in general SVE 128b is not just slower than neon. It's a case by
case basis. Sometime it's slower, sometime it's faster, so I don't think
we should just disable it by default. In any case, disabling it should
be its own patch with much discussion, not just a offhand thing we
include in the middle of this patch.
This SVE version is still faster than the upstream neon version. I just
happen to have improved the neon version even more.