RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X

Eduard Stefes Tue, 27 May 2025 01:26:34 -0700

Hi,

So I worked on the algorithm to also work on buffers between 16-64
bytes. Then I ran the performance measurement on two
dataset[^raw_data_1] [^raw_data_2]. And created two diagrams
[^attachment].


my findings so far:

- the optimized crc32cvx is faster
- the sb8 performance is heavily depending on alignment (see the
ripples every 8 bytes)
- the 8 byte ripple is also visible in the vx implementation. As it can
only perform on 16 or 64 byte chunks, it will still use sb8 for the
remaining bytes. 
- there is no obvious speed regression in the vx algorithm. Except
raw_data_2-28 which I assume is a fluke. I am sharing the system with a
bunch of other devs.


I hope this this is acceptable as performance measurement. However we
will setup a dedicated performance test and try to get precise numbers
without side-effects. But it may take some time until we get to that.

I'll post the update on the Code together with the other requested
updates. 

cheers, Eddy



[^raw_data_1]
bytes   crc32c_sb8      crc32c_vx
4       6.54 ms         6.548 ms
8       4.476 ms        4.47 ms
10      7.346 ms        7.348 ms
12      10.955 ms       10.958 ms
14      14.548 ms       14.546 ms
16      6.837 ms        6.193 ms
32      12.23 ms        6.741 ms
64      22.826 ms       7.6 ms
80      28.536 ms       8.307 ms
96      34.426 ms       9.09 ms
112     40.295 ms       9.844 ms
128     46.053 ms       10.825 ms
144     51.868 ms       11.712 ms
160     65.91 ms        12.122 ms
176     71.649 ms       13.055 ms
192     77.465 ms       11.716 ms
208     83.286 ms       13.532 ms
224     88.991 ms       13.165 ms
240     94.875 ms       13.881 ms
256     100.653 ms      13.147 ms
8192    2967.477 ms     182.911 ms

[^raw_data_2]
bytes   crc32c_sb8      crc32c_vx
4       6.543 ms        6.536 ms
8       4.476 ms        4.47 ms
10      7.35 ms         7.345 ms
12      10.96 ms        10.954 ms
14      14.552 ms       14.588 ms
16      6.843 ms        6.189 ms
18      10.253 ms       9.814 ms
24      9.645 ms        9.924 ms
28      15.957 ms       17.211 ms
32      12.226 ms       6.726 ms
36      18.823 ms       14.484 ms
42      17.855 ms       14.271 ms
48      17.342 ms       7.344 ms
52      24.208 ms       15.306 ms
58      23.525 ms       14.695 ms
64      22.818 ms       7.593 ms



On Thu, 2025-05-08 at 05:32 +0700, John Naylor wrote:
> On Wed, May 7, 2025 at 8:15 PM Aleksander Alekseev
> <aleksan...@timescale.com> wrote:
> > 
> > I didn't review the patch but wanted to point out that when it
> > comes
> > to performance improvements it's typically useful to provide some
> > benchmarks.
> 
> +1 -- It's good to have concrete numbers for the commit message, and
> also to verify improvement on short inputs. There is a test harness
> in
> the  v7-0002 patch from here:
> 
> https://www.postgresql.org/message-id/canwcazad5niydbf6q3v_cjapnv05cw-lpxxftmbwdplsz-p...@mail.gmail.com
>  
>  
> 
> After building, run the "test-crc.sh" script here after executing
> "CREATE EXTENSION test_crc32c;":
> 
> https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com
>  
>  
> 
> 
> 
> --
> John Naylor
> Amazon Web Services

RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X

Reply via email to