https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651

            Bug ID: 83651
           Summary: [7.2 regression] 20% slowdown of linux kernel AES
                    cipher
           Product: gcc
           Version: 7.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arnd at linaro dot org
  Target Milestone: ---

Following the discussion on PR83356, I did some more performance analysis of
the AES code with various compiler versions, by running the in-kernel crypto
selftest (kvm -kernel linux/arch/x86/boot/bzImage -append "tcrypt.mode=200
tcrypt.sec=1 console=ttyS0"  -nographic -serial mon:stdio), which showed a very
clear slowdown at gcc-7.2 (dated 20171130) compared to 7.1, all numbers are in
cycles/byte for AES256+CBC on a 3.1GHz AMD Threadripper, lower numbers are
better:

                default      ubsan         patched        patched+ubsan
gcc-4.3.6 -O2    14.9        ----           14.9         ----
gcc-4.6.4 -O2    15.0        ----           15.8         ----
gcc-4.9.4 -O2    15.5        20.7           15.9         20.9
gcc-5.5.0 -O2    15.6        47.3           86.4         48.8
gcc-6.3.1 -O2    14.6        49.4           94.3         50.9
gcc-7.1.1 -O2    13.5        54.6           15.2         52.0
gcc-7.2.1 -O2    16.8       124.7           92.0         52.2
gcc-8.0.0 -O2    14.6        56.6           15.3         53.5
gcc-7.1.1 -O1    14.6        53.8
gcc-7.2.1 -O1    15.5        55.9
gcc-8.0.0 -O1    15.0        50.7
clang-5 -O1      21.7        58.3
clang-5 -O2      15.5        49.1
handwritten asm  16.4

The 'patched' columns are with '-ftree-pre and -ftree-sra' disabled in the
sources, which happened to help on gcc-7.2.1 for performance and to work around
PR83356 but made things worse for most other cases.

For better reproducibility, I tried doing the same with the libressl
implementation of the same cipher, which also has interesting but unfortunately
very different results:

gcc-5.5.0 -O2    49.0
gcc-6.3.1 -O2    48.8
gcc-7.1.1 -O2    59.7
gcc-7.2.1 -O2    60.3
gcc-8.0.0 -O2    59.6

gcc-5.5.0 -O1    59.5
gcc-6.3.1 -O1    48.5
gcc-7.1.1 -O1    51.6
gcc-7.2.1 -O1    51.6
gcc-8.0.0 -O1    51.6

The source code is apparently derived from a common source, but has evolved in
different ways, and the version from the kernel appears to be much faster
overall. In both cases, we see a ~20% degradation between gcc-6.3.1 and
gcc-7.2.1, but gcc-7.1.1 happens to produce the best results for the kernel
version and very bad results for the libressl sources. The stack consumption
problem from PR83356 does not appear with the libressl sources. I have not
managed to run a ubsan-enabled libressl binary for testing.

To put this in context, both libressl and Linux come with architecture-specific
versions using SIMD registers for most architectures, and those tend to be much
faster, but the C version is used on old x86 CPUs and minor architectures that
lack SIMD registers or an AES implementation for them.

If there is enough interest in addressing the slowdown, it should be possible
to create a version of the kernel AES implementation that can be run in user
space, as the current method of reproducing the results is fairly tedious.

Reply via email to