On 04.01.2016 00:22, Tom Herbert wrote:
Implement assembly routine for csum_partial for 64 bit x86. This
primarily speeds up checksum calculation for smaller lengths such as
those that are present when doing skb_postpull_rcsum when getting
CHECKSUM_COMPLETE from device or after CHECKSUM_UNNECESSARY
conversion.

This implementation is similar to csum_partial implemented in
checksum_32.S, however since we are dealing with 8 bytes at a time
there are more cases for alignment and small lengths-- for those we
employ jump tables.

Testing:

Verified correctness by testing arbitrary length buffer filled with
random data. For each buffer I compared the computed checksum
using the original algorithm for each possible alignment (0-7 bytes).

Checksum performance:

Isolating old and new implementation for some common cases:

                         Old      New
Case                    nsecs    nsecs    Improvement
---------------------+--------+--------+-----------------------------
1400 bytes (0 align)    194.4    176.7      9%    (Big packet)
40 bytes (0 align)      10.5     5.7       45%    (Ipv6 hdr common case)
8 bytes (4 align)       8.6      7.4       15%    (UDP, VXLAN in IPv4)
14 bytes (0 align)      10.4     6.5       37%    (Eth hdr)
14 bytes (4 align)      10.8     7.8       27%    (Eth hdr in IPv4)

Signed-off-by: Tom Herbert <t...@herbertland.com>

I verified the implementation through tests and can also see a speed-up in almost all cases. Unfortunately _addcarry_u64 intrinsics and __int128 for letting the compiler use adc instructions generated even worse code as the current implementation.

Acked-by: Hannes Frederic Sowa <han...@stressinduktion.org>

Thanks Tom!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to