On 22/10/15 20:47, Paolo Bonzini wrote: > > > On 22/10/2015 19:39, Radim Krčmář wrote: >> 2015-10-22 18:14+0200, Paolo Bonzini: >>> On 22/10/2015 18:02, Eric Blake wrote: >>>> I see a bug in there: >>> >>> Of course. You shouldn't have told me what the bug was, I deserved >>> to look for it myself. :) >> >> It rather seems that you don't want spoilers, :) >> >> I see two bugs now. > > Me too. :) But Rusty surely has some testcases in case he wants to > adopt some of the ideas here. O:-)
For completeness this should address the bugs I think? bool memeqzero4_paolo(const void *data, size_t length) { const unsigned char *p = data; unsigned long word; if (!length) return true; /* Check len bytes not aligned on a word. */ while (__builtin_expect(length & (sizeof(word) - 1), 0)) { if (*p) return false; p++; length--; if (!length) return true; } /* Check up to 16 bytes a word at a time. */ for (;;) { memcpy(&word, p, sizeof(word)); if (word) return false; p += sizeof(word); length -= sizeof(word); if (!length) return true; if (__builtin_expect(length & 15, 0) == 0) break; } /* Now we know that's zero, memcmp with self. */ return memcmp(data, p, length) == 0; } compiled with gcc 5.1.1 -march=native -O2 on an i3-2310M we get these timings: bytes 1 8 16 512 65536 --------------------------------------------- Rusty: 10 28 59 114 6510 Paolo: 9 9 12 75 6495 It's also smaller, especially at -O3: $ nm -S a.out | grep memeqzero4 ... 000000000000005b t memeqzero4_paolo ... 0000000000000063 t memeqzero4_rusty $ gcc -march=native -O3 memeqzero.c $ nm -S a.out | grep memeqzero4 ... 000000000000005b t memeqzero4_paolo ... 0000000000000133 t memeqzero4_rusty cheers, Pádraig.