https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243
--- Comment #27 from Jan Hubicka <hubicka at gcc dot gnu.org> --- profile_estimate issue is still here, inliner and early inliner issues seems solved. Seems that ipa_profile just orders the nodes for propagation in wrong way - we propagate from callers to callees while toposorter is for propagation opoposite way. operand_scan seems slow too. Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1237 kB ( 0%) phase parsing : 6.63 ( 9%) 6.77 ( 77%) 13.41 ( 17%) 655497 kB ( 20%) phase opt and generate : 64.47 ( 91%) 2.07 ( 23%) 66.57 ( 83%) 2603397 kB ( 80%) garbage collection : 0.64 ( 1%) 0.00 ( 0%) 0.65 ( 1%) 0 kB ( 0%) dump files : 0.05 ( 0%) 0.01 ( 0%) 0.04 ( 0%) 0 kB ( 0%) callgraph construction : 0.91 ( 1%) 0.01 ( 0%) 0.83 ( 1%) 399235 kB ( 12%) callgraph optimization : 0.37 ( 1%) 0.00 ( 0%) 0.43 ( 1%) 0 kB ( 0%) callgraph functions expansion : 15.98 ( 22%) 1.20 ( 14%) 17.18 ( 21%) 297309 kB ( 9%) callgraph ipa passes : 40.57 ( 57%) 0.40 ( 5%) 40.99 ( 51%) 617751 kB ( 19%) ipa function summary : 0.14 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 1807 kB ( 0%) ipa dead code removal : 0.22 ( 0%) 0.00 ( 0%) 0.24 ( 0%) 0 kB ( 0%) ipa cp : 0.97 ( 1%) 0.03 ( 0%) 1.03 ( 1%) 327514 kB ( 10%) ipa inlining heuristics : 0.72 ( 1%) 0.00 ( 0%) 0.63 ( 1%) 84183 kB ( 3%) ipa function splitting : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) ipa various optimizations : 0.69 ( 1%) 0.20 ( 2%) 0.89 ( 1%) 128398 kB ( 4%) ipa reference : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) ipa profile : 18.24 ( 26%) 0.00 ( 0%) 18.25 ( 23%) 0 kB ( 0%) ipa pure const : 0.45 ( 1%) 0.00 ( 0%) 0.46 ( 1%) 0 kB ( 0%) ipa icf : 0.17 ( 0%) 0.02 ( 0%) 0.17 ( 0%) 0 kB ( 0%) ipa SRA : 0.21 ( 0%) 0.00 ( 0%) 0.21 ( 0%) 102 kB ( 0%) ipa free inline summary : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) cfg cleanup : 0.00 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 0 kB ( 0%) trivially dead code : 0.12 ( 0%) 0.03 ( 0%) 0.12 ( 0%) 0 kB ( 0%) df scan insns : 0.85 ( 1%) 0.14 ( 2%) 1.28 ( 2%) 46 kB ( 0%) df multiple defs : 0.30 ( 0%) 0.06 ( 1%) 0.31 ( 0%) 0 kB ( 0%) df reaching defs : 0.69 ( 1%) 0.05 ( 1%) 0.63 ( 1%) 0 kB ( 0%) df live regs : 0.49 ( 1%) 0.02 ( 0%) 0.57 ( 1%) 0 kB ( 0%) df live&initialized regs : 0.19 ( 0%) 0.01 ( 0%) 0.12 ( 0%) 0 kB ( 0%) df must-initialized regs : 0.10 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 kB ( 0%) df use-def / def-use chains : 0.44 ( 1%) 0.05 ( 1%) 0.40 ( 1%) 0 kB ( 0%) df reg dead/unused notes : 1.35 ( 2%) 0.09 ( 1%) 1.15 ( 1%) 747 kB ( 0%) register information : 0.16 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 0 kB ( 0%) alias analysis : 0.16 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 436 kB ( 0%) alias stmt walking : 0.49 ( 1%) 0.07 ( 1%) 0.67 ( 1%) 0 kB ( 0%) register scan : 0.04 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) rebuild jump labels : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) preprocessing : 2.37 ( 3%) 2.37 ( 27%) 4.49 ( 6%) 383477 kB ( 12%) lexical analysis : 1.88 ( 3%) 2.13 ( 24%) 4.20 ( 5%) 0 kB ( 0%) parser (global) : 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 1442 kB ( 0%) parser function body : 2.19 ( 3%) 2.26 ( 26%) 4.50 ( 6%) 270577 kB ( 8%) early inlining heuristics : 2.80 ( 4%) 0.03 ( 0%) 2.81 ( 4%) 3076 kB ( 0%) inline parameters : 6.43 ( 9%) 0.14 ( 2%) 6.74 ( 8%) 31127 kB ( 1%) integration : 0.17 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 6789 kB ( 0%) tree gimplify : 1.01 ( 1%) 0.03 ( 0%) 1.15 ( 1%) 610970 kB ( 19%) tree eh : 0.50 ( 1%) 0.03 ( 0%) 0.44 ( 1%) 0 kB ( 0%) tree CFG construction : 3.50 ( 5%) 0.02 ( 0%) 3.74 ( 5%) 628087 kB ( 19%) tree CFG cleanup : 0.69 ( 1%) 0.03 ( 0%) 0.67 ( 1%) 0 kB ( 0%) tree tail merge : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) tree VRP : 0.09 ( 0%) 0.03 ( 0%) 0.15 ( 0%) 2241 kB ( 0%) tree Early VRP : 0.06 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 1047 kB ( 0%) tree copy propagation : 0.07 ( 0%) 0.02 ( 0%) 0.10 ( 0%) 0 kB ( 0%) tree PTA : 0.40 ( 1%) 0.02 ( 0%) 0.37 ( 0%) 93 kB ( 0%) tree SSA rewrite : 1.41 ( 2%) 0.03 ( 0%) 1.48 ( 2%) 90326 kB ( 3%) tree SSA other : 0.15 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 140 kB ( 0%) tree SSA incremental : 0.05 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 kB ( 0%) tree operand scan : 7.64 ( 11%) 0.26 ( 3%) 7.95 ( 10%) 95305 kB ( 3%) dominator optimization : 0.03 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 155 kB ( 0%) backwards jump threading : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) isolate eroneous paths : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) tree CCP : 0.10 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 0 kB ( 0%) tree PRE : 0.19 ( 0%) 0.00 ( 0%) 0.19 ( 0%) 1276 kB ( 0%) tree FRE : 0.15 ( 0%) 0.05 ( 1%) 0.25 ( 0%) 701 kB ( 0%) tree code sinking : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) tree linearize phis : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1042 kB ( 0%) tree forward propagate : 0.07 ( 0%) 0.02 ( 0%) 0.13 ( 0%) 0 kB ( 0%) tree conservative DCE : 0.47 ( 1%) 0.13 ( 1%) 0.52 ( 1%) 0 kB ( 0%) tree aggressive DCE : 0.23 ( 0%) 0.03 ( 0%) 0.23 ( 0%) 2090 kB ( 0%) tree DSE : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) PHI merge : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) gimple widening/fma detection : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) tree strlen optimization : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1042 kB ( 0%) dominance computation : 0.24 ( 0%) 0.03 ( 0%) 0.18 ( 0%) 0 kB ( 0%) out of ssa : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) expand : 0.22 ( 0%) 0.07 ( 1%) 0.38 ( 0%) 128974 kB ( 4%) post expand cleanups : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 303 kB ( 0%) forward prop : 0.26 ( 0%) 0.03 ( 0%) 0.16 ( 0%) 0 kB ( 0%) CSE : 0.12 ( 0%) 0.07 ( 1%) 0.17 ( 0%) 0 kB ( 0%) dead code elimination : 0.09 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 kB ( 0%) dead store elim1 : 0.25 ( 0%) 0.00 ( 0%) 0.30 ( 0%) 11613 kB ( 0%) dead store elim2 : 0.30 ( 0%) 0.00 ( 0%) 0.33 ( 0%) 11613 kB ( 0%) loop init : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 4103 kB ( 0%) loop fini : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) CPROP : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) CSE 2 : 0.14 ( 0%) 0.02 ( 0%) 0.18 ( 0%) 23 kB ( 0%) branch prediction : 0.11 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 101 kB ( 0%) combiner : 0.21 ( 0%) 0.01 ( 0%) 0.24 ( 0%) 0 kB ( 0%) integrated RA : 1.22 ( 2%) 0.02 ( 0%) 1.38 ( 2%) 23989 kB ( 1%) LRA non-specific : 0.41 ( 1%) 0.00 ( 0%) 0.44 ( 1%) 54 kB ( 0%) LRA virtuals elimination : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) LRA reload inheritance : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%) LRA hard reg assignment : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) reload : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) reload CSE regs : 0.33 ( 0%) 0.00 ( 0%) 0.30 ( 0%) 46 kB ( 0%) ree : 0.07 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 0 kB ( 0%) thread pro- & epilogue : 0.61 ( 1%) 0.00 ( 0%) 0.33 ( 0%) 855 kB ( 0%) peephole 2 : 0.05 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 0 kB ( 0%) hard reg cprop : 0.11 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 0 kB ( 0%) scheduling 2 : 2.64 ( 4%) 0.00 ( 0%) 2.58 ( 3%) 244 kB ( 0%) machine dep reorg : 0.08 ( 0%) 0.01 ( 0%) 0.09 ( 0%) 0 kB ( 0%) reorder blocks : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) shorten branches : 0.09 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 0 kB ( 0%) final : 0.51 ( 1%) 0.00 ( 0%) 0.45 ( 1%) 1105 kB ( 0%) straight-line strength reduction : 0.00 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 0 kB ( 0%) initialize rtl : 0.00 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 12 kB ( 0%) address lowering : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) rest of compilation : 0.78 ( 1%) 0.10 ( 1%) 0.90 ( 1%) 2365 kB ( 0%) remove unused locals : 0.04 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) address taken : 0.04 ( 0%) 0.01 ( 0%) 0.08 ( 0%) 0 kB ( 0%) TOTAL : 71.10 8.84 79.98 3260140 kB