https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Keywords| |patch Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Resolution|FIXED |--- --- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Martin Jambor from comment #12) > For the record, I have benchmarked the patches from comment #4 and comment > #10 on top of commit 6b1633378b7 (for which I already have unpatched > benchmark results) and the regression of 519.lbm_r compiled with -O2 LTO > dropped from 62% to 8%. > > The -Ofast -march=native -flto vs. non-LTO regression also dropped from 8% > to about 5% (GCC 10 also has non-LTO 2.5% faster than LTO, but at least both > times improved vs. GCC 10). > > The only notable regression brought about the patch was 538.imagick_r when > compiled at -Ofast -march=native without LTO, which was 6% slower with the > patch. > > All of the measurements were done on a Zen2 machine. > > Thank you for reverting the patch, now we need to look for LNT to pick up > the changes. The complete patch that presumably corrects HONOR_REG_ALLOC_ORDER usage is at [1], but IIUC the above measurements, there is still a regression of 8% vs unpatched compiler. With the complete patch [1], ira_better_spill_reload_regno_p change should be a NO-OP, but the new default also disables the internal calculations in assign_hard_reg, please see [2] for reasoning. Based on the above benchmarks, it looks that disabling the internal calculations in assign_hard_reg is harmful even for HONOR_REG_ALLOC_ORDER targets, at least patched x86 compiler shows this effect. Maybe Vlad could comment this part. Let's reopen this PR to keep the discussions in one place. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565640.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565699.html