Hi, So I worked on the algorithm to also work on buffers between 16-64 bytes. Then I ran the performance measurement on two dataset[^raw_data_1] [^raw_data_2]. And created two diagrams [^attachment].
my findings so far: - the optimized crc32cvx is faster - the sb8 performance is heavily depending on alignment (see the ripples every 8 bytes) - the 8 byte ripple is also visible in the vx implementation. As it can only perform on 16 or 64 byte chunks, it will still use sb8 for the remaining bytes. - there is no obvious speed regression in the vx algorithm. Except raw_data_2-28 which I assume is a fluke. I am sharing the system with a bunch of other devs. I hope this this is acceptable as performance measurement. However we will setup a dedicated performance test and try to get precise numbers without side-effects. But it may take some time until we get to that. I'll post the update on the Code together with the other requested updates. cheers, Eddy [^raw_data_1] bytes crc32c_sb8 crc32c_vx 4 6.54 ms 6.548 ms 8 4.476 ms 4.47 ms 10 7.346 ms 7.348 ms 12 10.955 ms 10.958 ms 14 14.548 ms 14.546 ms 16 6.837 ms 6.193 ms 32 12.23 ms 6.741 ms 64 22.826 ms 7.6 ms 80 28.536 ms 8.307 ms 96 34.426 ms 9.09 ms 112 40.295 ms 9.844 ms 128 46.053 ms 10.825 ms 144 51.868 ms 11.712 ms 160 65.91 ms 12.122 ms 176 71.649 ms 13.055 ms 192 77.465 ms 11.716 ms 208 83.286 ms 13.532 ms 224 88.991 ms 13.165 ms 240 94.875 ms 13.881 ms 256 100.653 ms 13.147 ms 8192 2967.477 ms 182.911 ms [^raw_data_2] bytes crc32c_sb8 crc32c_vx 4 6.543 ms 6.536 ms 8 4.476 ms 4.47 ms 10 7.35 ms 7.345 ms 12 10.96 ms 10.954 ms 14 14.552 ms 14.588 ms 16 6.843 ms 6.189 ms 18 10.253 ms 9.814 ms 24 9.645 ms 9.924 ms 28 15.957 ms 17.211 ms 32 12.226 ms 6.726 ms 36 18.823 ms 14.484 ms 42 17.855 ms 14.271 ms 48 17.342 ms 7.344 ms 52 24.208 ms 15.306 ms 58 23.525 ms 14.695 ms 64 22.818 ms 7.593 ms On Thu, 2025-05-08 at 05:32 +0700, John Naylor wrote: > On Wed, May 7, 2025 at 8:15 PM Aleksander Alekseev > <aleksan...@timescale.com> wrote: > > > > I didn't review the patch but wanted to point out that when it > > comes > > to performance improvements it's typically useful to provide some > > benchmarks. > > +1 -- It's good to have concrete numbers for the commit message, and > also to verify improvement on short inputs. There is a test harness > in > the v7-0002 patch from here: > > https://www.postgresql.org/message-id/canwcazad5niydbf6q3v_cjapnv05cw-lpxxftmbwdplsz-p...@mail.gmail.com > > > > After building, run the "test-crc.sh" script here after executing > "CREATE EXTENSION test_crc32c;": > > https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com > > > > > > -- > John Naylor > Amazon Web Services