performance gain on SSE2 is approx. 20-25%. altivec is not tested. performance for unsigned long arithmetic is unchanged.
Signed-off-by: Peter Lieven <p...@kamp.de> --- util/cutils.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/util/cutils.c b/util/cutils.c index a09d8e8..23f0cd6 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len) * latency. */ + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 + && len % 8*sizeof(VECTYPE) == 0) { + return buffer_find_nonzero_offset(buf, len)==len; + } + size_t i; long d0, d1, d2, d3; const long * const data = buf; -- 1.7.9.5