https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103454
Bug ID: 103454
Summary: -finline-functions-called-once is both compile-time
and runtime loss at average for spec2006, spec2017 and
tramp3d
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Looking into exchange2 performance I run benchmarks with
-fno-inline-functions-called-once. It seems we do have important regressions
here.
The following compares default flags (base) and run with additional
-fno-inline-functions-called-once
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.01&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C67b183fac7b08067fdd3c09abd3efd2691083395&include_user_branches=on
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.01&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C67b183fac7b08067fdd3c09abd3efd2691083395&include_user_branches=on
Large differences are
default flags wins
- fatigue2 with both -O2 and -Ofast inlining 40%
-fno-inline-functions-called-once wins:
- tramp3d with -Ofast. 31%
- exchange2 with -Ofast 11-21%
- specfp2006 total build time 41% (mostly wrf that builds 71% faster)
- specint2005 total about 1.5-3%
- specfp2017 total 64% (again mostly wrf)
- specint2017 total 2.5-3.5%
Once more tests are run I can make better summary. It is couple releases since
I benchmarked -fno-inline-functions-called-once so I am not quite sure how long
we have the problem.
For exchange2 the problem is inlining different clones of digits2 into each
other. Each clone of digits2 has 9 nested loop and calls the other clone from
innermost one. I guess we may want to have loop depth limit on inlining once
and also have its own specific large-functions-insns and growth (in particular,
I think the growth wants to be smaller, like say 10% instead of letting
function grow twice).
It also shows however that we have problems in middle-end both in scalability
and code quality on large CFGs which are probably quite important (and anoying)
to track down.