On 23/10/2015 13:12, Pádraig Brady wrote: > On 22/10/15 20:47, Paolo Bonzini wrote: >> >> >> On 22/10/2015 19:39, Radim Krčmář wrote: >>> 2015-10-22 18:14+0200, Paolo Bonzini: >>>> On 22/10/2015 18:02, Eric Blake wrote: >>>>> I see a bug in there: >>>> >>>> Of course. You shouldn't have told me what the bug was, I deserved >>>> to look for it myself. :) >>> >>> It rather seems that you don't want spoilers, :) >>> >>> I see two bugs now. >> >> Me too. :) But Rusty surely has some testcases in case he wants to >> adopt some of the ideas here. O:-) > > For completeness this should address the bugs I think?
Yes, thanks! :D Paolo > bool memeqzero4_paolo(const void *data, size_t length) > { > const unsigned char *p = data; > unsigned long word; > > if (!length) > return true; > > /* Check len bytes not aligned on a word. */ > while (__builtin_expect(length & (sizeof(word) - 1), 0)) { > if (*p) > return false; > p++; > length--; > if (!length) > return true; > } > > /* Check up to 16 bytes a word at a time. */ > for (;;) { > memcpy(&word, p, sizeof(word)); > if (word) > return false; > p += sizeof(word); > length -= sizeof(word); > if (!length) > return true; > if (__builtin_expect(length & 15, 0) == 0) > break; > } > > /* Now we know that's zero, memcmp with self. */ > return memcmp(data, p, length) == 0; > } > > compiled with gcc 5.1.1 -march=native -O2 on an i3-2310M > we get these timings: > > bytes 1 8 16 512 65536 > --------------------------------------------- > Rusty: 10 28 59 114 6510 > Paolo: 9 9 12 75 6495 > > It's also smaller, especially at -O3: > > $ nm -S a.out | grep memeqzero4 > ... 000000000000005b t memeqzero4_paolo > ... 0000000000000063 t memeqzero4_rusty > $ gcc -march=native -O3 memeqzero.c > $ nm -S a.out | grep memeqzero4 > ... 000000000000005b t memeqzero4_paolo > ... 0000000000000133 t memeqzero4_rusty > > cheers, > Pádraig. >