85% time reduction on AMD Ryzen 5 5600:

$ ./gltests/bench-crc 1000000
real   1.740296
user   1.740
sys    0.000

$ ../bench-crc-pclmul 1000000
real   0.248324
user   0.248
sys    0.000

This translates to a 13% time reduction for gzip:

$ time ./gzip_sliceby8 -k -d -c large_file.gz > /dev/null

real    0m0.310s
user    0m0.310s
sys     0m0.000s

$ time ./gzip_pclmul -k -d -c large_file.gz > /dev/null

real    0m0.267s
user    0m0.267s
sys     0m0.000s

I haven't added anything to deal with unaligned memory (it will just
break), what are your thoughts on having a new API where the caller
guarantees the memory is aligned, and make the existing API verify this for
us and process the first N bytes normally? Also, I used memcpy to cheat
with alignment for slice-by-8, but could someone recommend a method for
detecting alignment so we can safely upscale to a __m128i pointer?

Attachment: 0001-crc-Add-PCLMUL-implementation.patch
Description: Binary data

Reply via email to