https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90316
Bug ID: 90316 Summary: large compile time increase in opt / alias stmt walking for Go example Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: thanm at google dot com Target Milestone: --- Created attachment 46276 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46276&action=edit Go testcase The attached Go file when built with gccgo tip takes a very long time to compile; I think the test case may be exposing some sort of non-linear behavior in tree-pre. Original problem was discovered with an application internal to Google (in auto-generated code); the testcase I am submitting is hand-generated but has the same control flow characteristics. The function of interest has no loops, but there is if-then-else control flow, and there are about 9000 bbs. Compiling with GCC 7 takes about 11 seconds; compiling with tip takes about 7 minutes (original Google testcase is larger and takes about 45 minutes to compile). Compile with: gccgo -c -O2 -g generated.go -o /tmp/gen.o Profile from "pprof": (pprof) top15 Showing nodes accounting for 381.91s, 96.34% of 396.42s total Dropped 1064 nodes (cum <= 1.98s) Showing top 15 nodes out of 50 flat flat% sum% cum cum% 84.13s 21.22% 21.22% 202.04s 50.97% et_splay 75.82s 19.13% 40.35% 80.73s 20.36% dominated_by_p 52.73s 13.30% 53.65% 52.73s 13.30% et_recomp_min (inline) 42.98s 10.84% 64.49% 384.60s 97.02% get_continuation_for_phi 41.95s 10.58% 75.07% 41.95s 10.58% set_depth_add (inline) 29.52s 7.45% 82.52% 29.52s 7.45% tree_check (inline) 19.98s 5.04% 87.56% 223.37s 56.35% et_below 10.32s 2.60% 90.16% 10.32s 2.60% set_depth (inline) 7.54s 1.90% 92.07% 7.54s 1.90% set_prev (inline) 6.73s 1.70% 93.76% 6.73s 1.70% set_next (inline) 4.93s 1.24% 95.01% 4.93s 1.24% dom_convert_dir_to_idx (inline) 3.06s 0.77% 95.78% 3.06s 0.77% bitmap_list_find_element (inline) 1.65s 0.42% 96.20% 3.20s 0.81% gimple_vuse (inline) 0.41s 0.1% 96.30% 4.32s 1.09% insert_updated_phi_nodes_for 0.16s 0.04% 96.34% 384.15s 96.90% maybe_skip_until (inline) Top offenders include "maybe_skip_until" and "get_continuation_for_phi". Time report: Time variable usr sys wall GGC phase parsing : 0.15 ( 0%) 0.01 ( 3%) 0.15 ( 0%) 7482 kB ( 12%) phase opt and generate : 191.73 (100%) 0.37 ( 97%) 192.11 (100%) 52645 kB ( 87%) phase last asm : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 116 kB ( 0%) garbage collection : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 0 kB ( 0%) callgraph construction : 0.01 ( 0%) 0.01 ( 3%) 0.01 ( 0%) 6204 kB ( 10%) callgraph optimization : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%) ipa function summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1 kB ( 0%) ipa dead code removal : 0.01 ( 0%) 0.01 ( 3%) 0.02 ( 0%) 0 kB ( 0%) ipa profile : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) cfg construction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 412 kB ( 1%) cfg cleanup : 0.01 ( 0%) 0.01 ( 3%) 0.02 ( 0%) 56 kB ( 0%) CFG verifier : 0.22 ( 0%) 0.01 ( 3%) 0.26 ( 0%) 0 kB ( 0%) trivially dead code : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%) df scan insns : 0.02 ( 0%) 0.01 ( 3%) 0.02 ( 0%) 0 kB ( 0%) df multiple defs : 0.25 ( 0%) 0.00 ( 0%) 0.25 ( 0%) 0 kB ( 0%) df reaching defs : 0.09 ( 0%) 0.01 ( 3%) 0.08 ( 0%) 0 kB ( 0%) df live regs : 0.12 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 0 kB ( 0%) df live&initialized regs : 0.06 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 kB ( 0%) df must-initialized regs : 0.03 ( 0%) 0.01 ( 3%) 0.03 ( 0%) 0 kB ( 0%) df use-def / def-use chains : 0.19 ( 0%) 0.04 ( 11%) 0.25 ( 0%) 0 kB ( 0%) df reg dead/unused notes : 0.07 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 530 kB ( 1%) register information : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) alias analysis : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1408 kB ( 2%) alias stmt walking : 184.44 ( 96%) 0.07 ( 18%) 184.51 ( 96%) 37 kB ( 0%) rebuild jump labels : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) parser (global) : 0.15 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 7182 kB ( 12%) inline parameters : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1 kB ( 0%) tree gimplify : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 5445 kB ( 9%) tree eh : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 525 kB ( 1%) tree CFG construction : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 2257 kB ( 4%) tree CFG cleanup : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 127 kB ( 0%) tree tail merge : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 543 kB ( 1%) tree copy propagation : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) tree PTA : 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 63 kB ( 0%) tree SSA rewrite : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1102 kB ( 2%) tree SSA other : 0.00 ( 0%) 0.01 ( 3%) 0.02 ( 0%) 0 kB ( 0%) tree SSA incremental : 2.43 ( 1%) 0.00 ( 0%) 2.42 ( 1%) 2156 kB ( 4%) tree operand scan : 0.02 ( 0%) 0.02 ( 5%) 0.04 ( 0%) 1141 kB ( 2%) dominator optimization : 0.07 ( 0%) 0.04 ( 11%) 0.11 ( 0%) 4046 kB ( 7%) backwards jump threading : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) tree CCP : 0.03 ( 0%) 0.01 ( 3%) 0.05 ( 0%) 585 kB ( 1%) tree PRE : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 1410 kB ( 2%) tree FRE : 0.03 ( 0%) 0.02 ( 5%) 0.06 ( 0%) 752 kB ( 1%) tree linearize phis : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1 kB ( 0%) tree backward propagate : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) tree forward propagate : 0.03 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) tree conservative DCE : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 128 kB ( 0%) tree aggressive DCE : 0.04 ( 0%) 0.01 ( 3%) 0.05 ( 0%) 2 kB ( 0%) tree DSE : 0.14 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 0 kB ( 0%) PHI merge : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) tree SSA uncprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) tree SSA verifier : 0.30 ( 0%) 0.01 ( 3%) 0.30 ( 0%) 0 kB ( 0%) tree STMT verifier : 0.38 ( 0%) 0.03 ( 8%) 0.36 ( 0%) 0 kB ( 0%) tree strlen optimization : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) callgraph verifier : 0.02 ( 0%) 0.01 ( 3%) 0.01 ( 0%) 0 kB ( 0%) dominance frontiers : 0.24 ( 0%) 0.00 ( 0%) 0.26 ( 0%) 0 kB ( 0%) dominance computation : 0.13 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 0 kB ( 0%) control dependences : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 kB ( 0%) out of ssa : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 63 kB ( 0%) expand vars : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 788 kB ( 1%) expand : 0.01 ( 0%) 0.01 ( 3%) 0.03 ( 0%) 5107 kB ( 8%) post expand cleanups : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) varconst : 0.00 ( 0%) 0.01 ( 3%) 0.00 ( 0%) 300 kB ( 0%) forward prop : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 281 kB ( 0%) CSE : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 113 kB ( 0%) dead code elimination : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) dead store elim1 : 0.10 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 57 kB ( 0%) dead store elim2 : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 58 kB ( 0%) loop init : 0.06 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 2 kB ( 0%) loop invariant motion : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) CPROP : 0.36 ( 0%) 0.01 ( 3%) 0.38 ( 0%) 169 kB ( 0%) PRE : 0.31 ( 0%) 0.00 ( 0%) 0.32 ( 0%) 1518 kB ( 3%) CSE 2 : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 309 kB ( 1%) branch prediction : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) combiner : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 56 kB ( 0%) integrated RA : 0.12 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 3073 kB ( 5%) LRA non-specific : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%) LRA create live ranges : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) LRA hard reg assignment : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) reload CSE regs : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 677 kB ( 1%) ree : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) thread pro- & epilogue : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 3 kB ( 0%) peephole 2 : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 525 kB ( 1%) hard reg cprop : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) scheduling 2 : 0.09 ( 0%) 0.01 ( 3%) 0.09 ( 0%) 284 kB ( 0%) reorder blocks : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 770 kB ( 1%) shorten branches : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) final : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 3229 kB ( 5%) variable output : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 571 kB ( 1%) symout : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1732 kB ( 3%) variable tracking : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 1862 kB ( 3%) var-tracking dataflow : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) var-tracking emit : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 56 kB ( 0%) rest of compilation : 0.05 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 378 kB ( 1%) verify RTL sharing : 0.15 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 0 kB ( 0%) repair loop structures : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) TOTAL : 191.89 0.38 192.27 60410 kB