v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/
Changes for v4: - Keep separate >= 256 entry point, but only keep constant length check inline. This allows the indirect function call to be hidden and optimized away when the pointer is constant. - Split out a >= 256 integer routine. - Simplify acceleration selection for testing. - Add function pointer typedef. - Implement new aarch64 accelerations. r~ Alexander Monakov (5): util/bufferiszero: Remove SSE4.1 variant util/bufferiszero: Remove AVX512 variant util/bufferiszero: Reorganize for early test for acceleration util/bufferiszero: Remove useless prefetches util/bufferiszero: Optimize SSE2 and AVX2 variants Richard Henderson (5): util/bufferiszero: Improve scalar variant util/bufferiszero: Introduce biz_accel_fn typedef util/bufferiszero: Simplify test_buffer_is_zero_next_accel util/bufferiszero: Add simd acceleration for aarch64 util/bufferiszero: Add sve acceleration for aarch64 host/include/aarch64/host/cpuinfo.h | 1 + include/qemu/cutils.h | 15 +- util/bufferiszero.c | 500 ++++++++++++++++------------ util/cpuinfo-aarch64.c | 1 + meson.build | 13 + 5 files changed, 323 insertions(+), 207 deletions(-) -- 2.34.1