https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Bug ID: 83651 Summary: [7.2 regression] 20% slowdown of linux kernel AES cipher Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: arnd at linaro dot org Target Milestone: --- Following the discussion on PR83356, I did some more performance analysis of the AES code with various compiler versions, by running the in-kernel crypto selftest (kvm -kernel linux/arch/x86/boot/bzImage -append "tcrypt.mode=200 tcrypt.sec=1 console=ttyS0" -nographic -serial mon:stdio), which showed a very clear slowdown at gcc-7.2 (dated 20171130) compared to 7.1, all numbers are in cycles/byte for AES256+CBC on a 3.1GHz AMD Threadripper, lower numbers are better: default ubsan patched patched+ubsan gcc-4.3.6 -O2 14.9 ---- 14.9 ---- gcc-4.6.4 -O2 15.0 ---- 15.8 ---- gcc-4.9.4 -O2 15.5 20.7 15.9 20.9 gcc-5.5.0 -O2 15.6 47.3 86.4 48.8 gcc-6.3.1 -O2 14.6 49.4 94.3 50.9 gcc-7.1.1 -O2 13.5 54.6 15.2 52.0 gcc-7.2.1 -O2 16.8 124.7 92.0 52.2 gcc-8.0.0 -O2 14.6 56.6 15.3 53.5 gcc-7.1.1 -O1 14.6 53.8 gcc-7.2.1 -O1 15.5 55.9 gcc-8.0.0 -O1 15.0 50.7 clang-5 -O1 21.7 58.3 clang-5 -O2 15.5 49.1 handwritten asm 16.4 The 'patched' columns are with '-ftree-pre and -ftree-sra' disabled in the sources, which happened to help on gcc-7.2.1 for performance and to work around PR83356 but made things worse for most other cases. For better reproducibility, I tried doing the same with the libressl implementation of the same cipher, which also has interesting but unfortunately very different results: gcc-5.5.0 -O2 49.0 gcc-6.3.1 -O2 48.8 gcc-7.1.1 -O2 59.7 gcc-7.2.1 -O2 60.3 gcc-8.0.0 -O2 59.6 gcc-5.5.0 -O1 59.5 gcc-6.3.1 -O1 48.5 gcc-7.1.1 -O1 51.6 gcc-7.2.1 -O1 51.6 gcc-8.0.0 -O1 51.6 The source code is apparently derived from a common source, but has evolved in different ways, and the version from the kernel appears to be much faster overall. In both cases, we see a ~20% degradation between gcc-6.3.1 and gcc-7.2.1, but gcc-7.1.1 happens to produce the best results for the kernel version and very bad results for the libressl sources. The stack consumption problem from PR83356 does not appear with the libressl sources. I have not managed to run a ubsan-enabled libressl binary for testing. To put this in context, both libressl and Linux come with architecture-specific versions using SIMD registers for most architectures, and those tend to be much faster, but the C version is used on old x86 CPUs and minor architectures that lack SIMD registers or an AES implementation for them. If there is enough interest in addressing the slowdown, it should be possible to create a version of the kernel AES implementation that can be run in user space, as the current method of reproducing the results is fairly tedious.