Hi Sam,

Thanks for working on this!

Sam Russell wrote:
> 85% time reduction on AMD Ryzen 5 5600:
> 
> $ ./gltests/bench-crc 1000000
> real   1.740296
> user   1.740
> sys    0.000
> 
> $ ../bench-crc-pclmul 1000000
> real   0.248324
> user   0.248
> sys    0.000
> 
> This translates to a 13% time reduction for gzip:
> 
> $ time ./gzip_sliceby8 -k -d -c large_file.gz > /dev/null
> 
> real    0m0.310s
> user    0m0.310s
> sys     0m0.000s
> 
> $ time ./gzip_pclmul -k -d -c large_file.gz > /dev/null
> 
> real    0m0.267s
> user    0m0.267s
> sys     0m0.000s

This indicates that crc is no longer one of the biggest time consumer for
gzip. Still, a 13% speedup is something we would like to include.

Let me comment regarding the module structure and build system. I'll let
Simon speak up for the rest.

* I would suggest to rename the main source file from crc-pclmul.c to
  crc-x86_64.c.
  Rationale: So that immediately clear that the code is specific to the
  x86_64 CPUs. Not everyone is an assembly language hacker, and even some
  assembly language hackers (like me) don't know about the 'pclmul'
  instruction and of which ISA it is part of. Whereas everyone knows what
  x86_64 is.

* Similarly, I suggest to rename the module 'crc-pclmul' to 'crc-x86_64'.

* I like the fact that you have put the new code in a separate module.
  This will allow packages to invoke gnulib-tool with options
     --avoid=crc-x86_64  crc
  if they don't want to include the CPU specific optimization.

* In the patch, one of the diffs ends with
  \ No newline at end of file
  Could you please add a newline at the end of that source file?

* In crc.c: Would it make sense to cache the value of pclmul_enabled
  in a static variable?
  You can probably guess it by looking at the assembly-language code
  that gcc -O2 produces for the two __builtin_cpu_supports invocations.

* Are the options -mpclmul -mavx understood by both gcc and clang?
  Or does clang use different options for the same thing?

* In the module description, when you write
    CFLAGS += -mpclmul -mavx
  it has an effect on all compilations of the same Makefile (e.g. in the
  case of coreutils, on all the coreutils programs). Which means that the
  resulting binaries will *NOT WORK* on CPUs that don't have AVX extensions.

  So, these CFLAGS changes should be limited to the one compilation unit
  that needs it. Unfortunately, this is not easy to do with Automake (it
  requires an intermediate convenience library to be declared), and I don't
  know whether gnulib-tool already supports this contortion.

  But before I look into it in detail: Is there an alternative to these
  command-line options? Maybe there are some #pragmas that have the same
  effect?
 
> but could someone recommend a method for
> detecting alignment so we can safely upscale to a __m128i pointer?

How about  ((uintptr_t) ptr & 0xF) == 0
or something like that?

Bruno




Reply via email to