On 22/10/2015 16:37, Eric Blake wrote:
>> > +  /* Check first 16 bytes manually.  */
>> > +  for (len = 0; len < 16; len++)
>> > +    {
>> > +      if (! bufsize)
>> > +        return true;
>> > +      if (*p)
>> > +        return false;
>> > +      p++;
>> > +      bufsize--;
>> > +    }
>> > +
>> > +  /* Now we know that's zero, memcmp with self.  */
>> > +  return memcmp (buf, p, bufsize) == 0;
>> >  }
> Cool trick of using a suitably-aligned overlap-to-self check to then
> trigger platform-specific speedups without having to rewrite them by
> hand!  qemu is doing a similar check in util/cutils.c:buffer_is_zero()
> that could probably benefit from the same idea.

Nice trick indeed.  On the other hand, the first 16 bytes are enough to
rule out 99.99% (number out of thin hair) of the non-zero blocks, so
that's where you want to optimize.  Checking them an unsigned long at a
time, or fetching a few unsigned longs and ORing them together would
probably be the best of both worlds, because you then only use the FPU
in the rare case of a zero buffer.

Paolo

Reply via email to