v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/

Changes for v4:
  - Keep separate >= 256 entry point, but only keep constant length
    check inline.  This allows the indirect function call to be hidden
    and optimized away when the pointer is constant.
  - Split out a >= 256 integer routine.
  - Simplify acceleration selection for testing.
  - Add function pointer typedef.
  - Implement new aarch64 accelerations.


r~


Alexander Monakov (5):
  util/bufferiszero: Remove SSE4.1 variant
  util/bufferiszero: Remove AVX512 variant
  util/bufferiszero: Reorganize for early test for acceleration
  util/bufferiszero: Remove useless prefetches
  util/bufferiszero: Optimize SSE2 and AVX2 variants

Richard Henderson (5):
  util/bufferiszero: Improve scalar variant
  util/bufferiszero: Introduce biz_accel_fn typedef
  util/bufferiszero: Simplify test_buffer_is_zero_next_accel
  util/bufferiszero: Add simd acceleration for aarch64
  util/bufferiszero: Add sve acceleration for aarch64

 host/include/aarch64/host/cpuinfo.h |   1 +
 include/qemu/cutils.h               |  15 +-
 util/bufferiszero.c                 | 500 ++++++++++++++++------------
 util/cpuinfo-aarch64.c              |   1 +
 meson.build                         |  13 +
 5 files changed, 323 insertions(+), 207 deletions(-)

-- 
2.34.1


Reply via email to