Tom Herbert <t...@herbertland.com> writes: > Also, we don't do anything special for alignment, unaligned > accesses on x86 do not appear to be a performance issue.
This is not true on Atom CPUs. Also on most CPUs there is still a larger penalty when crossing cache lines. > Verified correctness by testing arbitrary length buffer filled with > random data. For each buffer I compared the computed checksum > using the original algorithm for each possible alignment (0-7 bytes). > > Checksum performance: > > Isolating old and new implementation for some common cases: You forgot to state the CPU. The results likely depend heavily on the micro architecture. The original C code was optimized for K8 FWIW. Overall your assembler looks similar to the C code, except for the jump table. Jump table has the disadvantage that it is much harder to branch predict, with a large penalty if it's mispredicted. I would expect it to be slower for cases where the length changes frequently. Did you benchmark that case? -Andi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html