https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
Bug ID: 99083
Summary: Big run-time regressions of 519.lbm_r with LTO
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
CC: ubizjak at gmail dot com
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
On AMD Zen2 CPUs, 519.lbm_r is 62.12% slower when built with -O2 and
-flto than when not using LTO. It is also 62.12% slower than when
using GCC 10 with the two options. My measurements match those from
LNT on a different zen2:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=325.477.0&plot.1=312.477.0&plot.2=349.477.0&plot.3=278.477.0&plot.4=401.477.0&plot.5=298.477.0
On the same CPU, compiling the benchmark with -Ofast -march=native
-flto is slower than non-LTO, by 8.07% on Zen2 and 6.06% on Zen3. The
Zen2 case has also been caught by LNT:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.477.0&plot.1=293.477.0&plot.2=287.477.0&plot.3=286.477.0&
I have bisected both of these regressions (on Zen2s) to:
commit 4c61e35f20fe2ffeb9421dbd6f26c767a234a4a0
Author: Uros Bizjak <[email protected]>
Date: Wed Dec 9 21:06:07 2020 +0100
i386: Remove REG_ALLOC_ORDER definition
REG_ALLOC_ORDER just defines what the default is set to.
2020-12-09 Uroš Bizjak <[email protected]>
gcc/
* config/i386/i386.h (REG_ALLOC_ORDER): Remove
...which looks like it was supposed to be a no-op, but I looked at the
-O2 LTO case and the assembly generated by this commit definitely
differs from the assembly produced by the previous one in instruction
selection, spilling and even some scheduling. For example, I see
hunks like:
@@ -994,10 +996,10 @@
movapd %xmm13, %xmm9
movsd 96(%rsp), %xmm13
subsd %xmm12, %xmm9
- movsd 256(%rsp), %xmm12
+ movq %rbx, %xmm12
+ mulsd %xmm6, %xmm12
movsd %xmm5, 15904(%rdx)
movsd 72(%rax), %xmm5
- mulsd %xmm6, %xmm12
mulsd %xmm0, %xmm9
subsd %xmm10, %xmm5
movsd 216(%rsp), %xmm10
The -Ofast native LTO assemblies also differ.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)