https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
Bug ID: 99083 Summary: Big run-time regressions of 519.lbm_r with LTO Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: ubizjak at gmail dot com Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux On AMD Zen2 CPUs, 519.lbm_r is 62.12% slower when built with -O2 and -flto than when not using LTO. It is also 62.12% slower than when using GCC 10 with the two options. My measurements match those from LNT on a different zen2: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=325.477.0&plot.1=312.477.0&plot.2=349.477.0&plot.3=278.477.0&plot.4=401.477.0&plot.5=298.477.0 On the same CPU, compiling the benchmark with -Ofast -march=native -flto is slower than non-LTO, by 8.07% on Zen2 and 6.06% on Zen3. The Zen2 case has also been caught by LNT: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.477.0&plot.1=293.477.0&plot.2=287.477.0&plot.3=286.477.0& I have bisected both of these regressions (on Zen2s) to: commit 4c61e35f20fe2ffeb9421dbd6f26c767a234a4a0 Author: Uros Bizjak <ubiz...@gmail.com> Date: Wed Dec 9 21:06:07 2020 +0100 i386: Remove REG_ALLOC_ORDER definition REG_ALLOC_ORDER just defines what the default is set to. 2020-12-09 Uroš Bizjak <ubiz...@gmail.com> gcc/ * config/i386/i386.h (REG_ALLOC_ORDER): Remove ...which looks like it was supposed to be a no-op, but I looked at the -O2 LTO case and the assembly generated by this commit definitely differs from the assembly produced by the previous one in instruction selection, spilling and even some scheduling. For example, I see hunks like: @@ -994,10 +996,10 @@ movapd %xmm13, %xmm9 movsd 96(%rsp), %xmm13 subsd %xmm12, %xmm9 - movsd 256(%rsp), %xmm12 + movq %rbx, %xmm12 + mulsd %xmm6, %xmm12 movsd %xmm5, 15904(%rdx) movsd 72(%rax), %xmm5 - mulsd %xmm6, %xmm12 mulsd %xmm0, %xmm9 subsd %xmm10, %xmm5 movsd 216(%rsp), %xmm10 The -Ofast native LTO assemblies also differ. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)