On 04.01.2016 00:22, Tom Herbert wrote:
Implement assembly routine for csum_partial for 64 bit x86. This
primarily speeds up checksum calculation for smaller lengths such as
those that are present when doing skb_postpull_rcsum when getting
CHECKSUM_COMPLETE from device or after CHECKSUM_UNNECESSARY
conversion.
This implementation is similar to csum_partial implemented in
checksum_32.S, however since we are dealing with 8 bytes at a time
there are more cases for alignment and small lengths-- for those we
employ jump tables.
Testing:
Verified correctness by testing arbitrary length buffer filled with
random data. For each buffer I compared the computed checksum
using the original algorithm for each possible alignment (0-7 bytes).
Checksum performance:
Isolating old and new implementation for some common cases:
Old New
Case nsecs nsecs Improvement
---------------------+--------+--------+-----------------------------
1400 bytes (0 align) 194.4 176.7 9% (Big packet)
40 bytes (0 align) 10.5 5.7 45% (Ipv6 hdr common case)
8 bytes (4 align) 8.6 7.4 15% (UDP, VXLAN in IPv4)
14 bytes (0 align) 10.4 6.5 37% (Eth hdr)
14 bytes (4 align) 10.8 7.8 27% (Eth hdr in IPv4)
Signed-off-by: Tom Herbert <t...@herbertland.com>
I verified the implementation through tests and can also see a speed-up
in almost all cases. Unfortunately _addcarry_u64 intrinsics and __int128
for letting the compiler use adc instructions generated even worse code
as the current implementation.
Acked-by: Hannes Frederic Sowa <han...@stressinduktion.org>
Thanks Tom!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html