20 jun 2013 kl. 21.44 skrev Stefan Fuhrmann:
A capable compiler should unroll the inner loop such that we end up with ~10 cycles / 4 bytes. That would be slightly faster than the "* 33" loop.
That depends on a lot of things (such as the latency/throughput of the multiplier).
By the way, the new inner loop suffers from signed overflow (undefined behaviour), and also sign extension when char is signed (which it is on SPARC). Both need to be fixed.
I had preferred the other patch for its simplicity. However, I'm fine with the current one and voted for its backport to 1.8.x. It gives us target-independent cache behavior - which is a good thing.
No it doesn't. The code already produced different hashes on x86 and ppc because of differences in byte order.