This patch improves the performance of SSE42 CRC32C algorithm. The current 
SSE4.2 implementation of CRC32C relies on the native crc32 instruction and 
processes 8 bytes at a time in a loop. The technique in  this paper uses the 
pclmulqdq instruction and processing 64 bytes at time. The algorithm is based 
on sse42 version of crc32 computation from Chromium's copy of zlib with 
modified constants for crc32c computation. See:

https://chromium.googlesource.com/chromium/src/+/refs/heads/main/third_party/zlib/crc32_simd.c

Microbenchmarks (generated with google benchmark using a standalone version of 
the same algorithms):

Comparing scalar_crc32c to sse42_crc32c (for various buffer sizes: 64, 128, 
256, 512, 1024, 2048 bytes)
Benchmark                                               Time             CPU    
  Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------
[scalar_crc32c vs. sse42_crc32c]/64                  -0.8147         -0.8148    
        33             6            33             6
[scalar_crc32c vs. sse42_crc32c]/128                -0.8962         -0.8962     
       88             9            88             9
[scalar_crc32c vs. sse42_crc32c]/256                -0.9200         -0.9200     
      211            17           211            17
[scalar_crc32c vs. sse42_crc32c]/512                -0.9389         -0.9389     
      486            30           486            30
[scalar_crc32c vs. sse42_crc32c]/1024              -0.9452         -0.9452      
    1037            57          1037            57
[scalar_crc32c vs. sse42_crc32c]/2048             -0.9456         -0.9456       
   2140           116          2140           116

Raghuveer

Attachment: v1-0001-Add-more-test-coverage-for-crc32c.patch
Description: v1-0001-Add-more-test-coverage-for-crc32c.patch

Attachment: v1-0002-Improve-CRC32C-performance-on-SSE4.2.patch
Description: v1-0002-Improve-CRC32C-performance-on-SSE4.2.patch

Reply via email to