performance gain on SSE2 is approx. 20-25%. altivec
is not tested. performance for unsigned long arithmetic
is unchanged.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 util/cutils.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/util/cutils.c b/util/cutils.c
index 41c627e..0f43c22 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -205,6 +205,11 @@ bool buffer_is_zero(const void *buf, size_t len)
     long d0, d1, d2, d3;
     const long * const data = buf;
 
+    /* use vector optimized zero check if possible */
+    if (can_use_buffer_find_nonzero_offset(buf, len)) {
+        return buffer_find_nonzero_offset(buf, len) == len;
+    }
+
     assert(len % (4 * sizeof(long)) == 0);
     len /= sizeof(long);
 
-- 
1.7.9.5


Reply via email to