Hi Sam, Thanks for the contribution offer!
> I've noticed that GZIP trails behind zlib in performance and part of this > is down to the fact that zlib is using a more efficient CRC32 > implementation. How much of a speedup do you obtain in gzip overall (not in CRC32 alone) for large files, through these slice-by-4 and slice-by-8 techniques? > the tables are generated by > extending the code from RFC 1952 to generate the lookups for partial > bitfields, this can be provided on request but it's not my finest work). Yes, if we include your code, we would also include the generator. Doesn't matter in which programming language it is written: C is fine, Python is fine, Lisp is fine. > - free software/patents: the Intel paper is from 2008; zlib had an > independently-discovered version of slice-by-4 from 2002 so it seems > unlikely there should be any patents encumbering this. Good. > the software is my > own design and I am happy for it to be transferred to GNU and licensed > accordingly Yes, we would need a copyright assignment to the FSF for this code, as it contains 40 lines of hand-written code (-> legally significant). > Requests for help: > - how do I get someone to review my code and then get it added to the > codebase? is this done via Savannah, or the mailing list, do we email > around PATCH files etc? We do it through the mailing list. Patches in 'diff -u' or 'git diff' format are fine, as are patch files produced by 'git format-patch'. As attachments, please, not inline. Please provide a ChangeLog entry; we reuse that ChangeLog entry for the commit message. > - I assume we'll want to gate this functionality based on CPU ability > (32-bit arithmetic is presumably fine but 64-bit makes no sense unless > running on a system with a native 64-bit bus). What is the convention here > for enforcing this at compile-time? You could add the file m4/host-cpu-c-abi.m4 to the module, invoke gl_HOST_CPU_C_ABI_32BIT at configure time, and then dispatch on HOST_CPU_C_ABI_32BIT. Probably, for embedded systems which don't want to spend 4 KB of cache, it would be useful to have a #define for the entire optimization. Bruno