Re: [dev] libzahl paper

Markus Wichmann Tue, 14 Jun 2016 12:04:31 -0700

On Mon, Jun 13, 2016 at 11:32:11PM +0200, Mattias Andrée wrote:
> Proofreading and suggestions is greatly appreciated!
>


Page 5: "Branches are incredibly cheap on modern CPUs." Not so! I wrote
a variety of CRC32 algorithms to benchmark (which will just measure my
personal CPU, I know, but it's better than guesswork), and had two
different bitwise versions:

for (i = 0; i < 8; i++)
{
    c = crc & 1;
    crc >>= 1;
    if (c)
        crc ^= 0xedb88320ul;
}

And:

for (i = 0; i < 8; i++)
{
    crc = (crc >> 1) ^ (-(crc & 1) & 0xedb88320ul);
}

The second version had twice the throughput of the first. (Of course
that 4-bytes-at-a-time version had the highest throughput, after that
the times got bigger again, but that's beside the point.) That remained
even when writing the loop in assembly by hand (the compiler refused to
just write this:)

shr eax
jnc 1f
xor eax, edb88320ul
1:

And that even though the second version is longer in assembly. But
eliminating that branch really did increase the throughput immensly.

Although, I don't know what would happen on a different architecture.

Ciao,
Markus

Re: [dev] libzahl paper

Reply via email to