Re: [PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-17 Thread Richard Henderson
On 2/17/24 01:33, Alexander Monakov wrote: On Fri, 16 Feb 2024, Richard Henderson wrote: Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86

Re: [PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > double-check with the compiler flags for __ARM_NEON and don't bother with > a runtime check. Otherwise, model the loop after the x86 SSE2 function, > and use VADDV to reduc

[PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-16 Thread Richard Henderson
Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function, and use VADDV to reduce the four vector comparisons. Signed-off-by: Richard He