85% time reduction on AMD Ryzen 5 5600: $ ./gltests/bench-crc 1000000 real 1.740296 user 1.740 sys 0.000
$ ../bench-crc-pclmul 1000000 real 0.248324 user 0.248 sys 0.000 This translates to a 13% time reduction for gzip: $ time ./gzip_sliceby8 -k -d -c large_file.gz > /dev/null real 0m0.310s user 0m0.310s sys 0m0.000s $ time ./gzip_pclmul -k -d -c large_file.gz > /dev/null real 0m0.267s user 0m0.267s sys 0m0.000s I haven't added anything to deal with unaligned memory (it will just break), what are your thoughts on having a new API where the caller guarantees the memory is aligned, and make the existing API verify this for us and process the first N bytes normally? Also, I used memcpy to cheat with alignment for slice-by-8, but could someone recommend a method for detecting alignment so we can safely upscale to a __m128i pointer?
0001-crc-Add-PCLMUL-implementation.patch
Description: Binary data