Patches 1-3 remove the use of ifunc from the implementation. Patch 5 adjusts the x86 implementation a bit more to take advantage of ptest (in sse4.1) and unaligned accesses (in avx1).
Patches 2 and 6 are the result of my conversation with Vijaya Kumar with respect to ThunderX. Patch 7 is the result of seeing some really really horrible code produced for ppc64le (gcc 4.9 and mainline). This has had limited testing. What I don't know is the best way to benchmark this -- the only way I know to trigger this is via the console, by hand, which doesn't make for reasonable timing. r~ Richard Henderson (7): cutils: Remove SPLAT macro cutils: Export only buffer_is_zero cutils: Rearrange buffer_is_zero acceleration cutils: Add generic prefetch cutils: Rewrite x86 buffer zero checking cutils: Rewrite aarch64 buffer zero checking cutils: Rewrite ppc buffer zero checking configure | 21 +- include/qemu/cutils.h | 2 - migration/ram.c | 2 +- migration/rdma.c | 5 +- util/cutils.c | 526 +++++++++++++++++++++++++++++++++----------------- 5 files changed, 352 insertions(+), 204 deletions(-) -- 2.7.4