https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103454
Bug ID: 103454 Summary: -finline-functions-called-once is both compile-time and runtime loss at average for spec2006, spec2017 and tramp3d Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Looking into exchange2 performance I run benchmarks with -fno-inline-functions-called-once. It seems we do have important regressions here. The following compares default flags (base) and run with additional -fno-inline-functions-called-once https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.01&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C67b183fac7b08067fdd3c09abd3efd2691083395&include_user_branches=on https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_changes=on&min_percentage_change=0.01&revisions=c53447034965e4191a8738f045a3c7d1552d5f59%2C67b183fac7b08067fdd3c09abd3efd2691083395&include_user_branches=on Large differences are default flags wins - fatigue2 with both -O2 and -Ofast inlining 40% -fno-inline-functions-called-once wins: - tramp3d with -Ofast. 31% - exchange2 with -Ofast 11-21% - specfp2006 total build time 41% (mostly wrf that builds 71% faster) - specint2005 total about 1.5-3% - specfp2017 total 64% (again mostly wrf) - specint2017 total 2.5-3.5% Once more tests are run I can make better summary. It is couple releases since I benchmarked -fno-inline-functions-called-once so I am not quite sure how long we have the problem. For exchange2 the problem is inlining different clones of digits2 into each other. Each clone of digits2 has 9 nested loop and calls the other clone from innermost one. I guess we may want to have loop depth limit on inlining once and also have its own specific large-functions-insns and growth (in particular, I think the growth wants to be smaller, like say 10% instead of letting function grow twice). It also shows however that we have problems in middle-end both in scalability and code quality on large CFGs which are probably quite important (and anoying) to track down.