On Mon, Oct 14, 2024 at 2:26 PM Simon Josefsson via Gnulib discussion list <bug-gnulib@gnu.org> wrote: > > Sam Russell <sam.h.russ...@gmail.com> writes: > > > I've noticed that GZIP trails behind zlib in performance and part of this > > is down to the fact that zlib is using a more efficient CRC32 > > implementation. I've written an implementation of this for gnulib based off > > the Intel paper at > > https://static.aminer.org/pdf/PDF/000/432/446/a_systematic_approach_to_building_high_performance_software_based_crc.pdf > > (the code is mine, written based on the paper, the tables are generated by > > extending the code from RFC 1952 to generate the lookups for partial > > bitfields, this can be provided on request but it's not my finest work). > > > > The code: > > https://github.com/samrussell/gnulib/commit/2d5f5d0e131feea6e04cb48d56589537506f91a8 > > (yes, I am aware you don't take contributions via github. I have an open > > ticket with GNU to get my SSH access fixed so I can have non-anonymous > > access to the repository on Savannah). > > Thanks! This looks nice, but please add code that generated the tables > which is important for reproduction. > > I am worried about the size increase with the new tables, what do you > think about making the new approach optional with some #ifdef, which may > be off by default to prefer your new optimized variant? > > You make several changes to the test vectors: please make them as > ADDITIONS instead. We need some confidence that the old test vectors > work. Since the new code works via alignment, please add one test > vector per string size: 0, 1, 2, 3, ... up to say 20. Did you test this > on 32-bit and 64-bit platforms? Use valgrind to QA further.
Yes, +1. Changing the existing test was like poking me in the eye with a finger. Please add additional tests! > Maybe there is more room for optimization... SSE4.2+ has a hardware > instruction for CRC. Support for other CRC-32 would be nice too. A > reasonable specification for it would be nice too, I can find plenty of > definitions in various RFC's but they are all duplicated. > https://en.wikipedia.org/wiki/Cyclic_redundancy_check is informative. > If you want to write an IETF RFC on this, maybe we can collaborate :) If I recall correctly, SSE4.2 uses CRC polynomial 0x82F63B78. It may not be the same as libc's polynomial. (The two big ones I am aware of are CRC32 using polynomial 0xEDB88320 and CRC32-C using polynomial 0x82F63B78). Jeff