https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90316

            Bug ID: 90316
           Summary: large compile time increase in opt / alias stmt
                    walking for Go example
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thanm at google dot com
  Target Milestone: ---

Created attachment 46276
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46276&action=edit
Go testcase

The attached Go file when built with gccgo tip takes a very long time to
compile; I think the test case may be exposing some sort of non-linear behavior
in tree-pre.  Original problem was discovered with an application internal to
Google (in auto-generated code); the testcase I am submitting is hand-generated
but has the same control flow characteristics. 

The function of interest has no loops, but there is if-then-else control flow,
and there are about 9000 bbs. Compiling with GCC 7 takes about 11 seconds;
compiling with tip takes about 7 minutes (original Google testcase is larger
and takes about 45 minutes to compile).

Compile with: gccgo -c -O2 -g generated.go -o /tmp/gen.o

Profile from "pprof":

(pprof) top15
Showing nodes accounting for 381.91s, 96.34% of 396.42s total
Dropped 1064 nodes (cum <= 1.98s)
Showing top 15 nodes out of 50
      flat  flat%   sum%        cum   cum%
    84.13s 21.22% 21.22%    202.04s 50.97%  et_splay
    75.82s 19.13% 40.35%     80.73s 20.36%  dominated_by_p
    52.73s 13.30% 53.65%     52.73s 13.30%  et_recomp_min (inline)
    42.98s 10.84% 64.49%    384.60s 97.02%  get_continuation_for_phi
    41.95s 10.58% 75.07%     41.95s 10.58%  set_depth_add (inline)
    29.52s  7.45% 82.52%     29.52s  7.45%  tree_check (inline)
    19.98s  5.04% 87.56%    223.37s 56.35%  et_below
    10.32s  2.60% 90.16%     10.32s  2.60%  set_depth (inline)
     7.54s  1.90% 92.07%      7.54s  1.90%  set_prev (inline)
     6.73s  1.70% 93.76%      6.73s  1.70%  set_next (inline)
     4.93s  1.24% 95.01%      4.93s  1.24%  dom_convert_dir_to_idx (inline)
     3.06s  0.77% 95.78%      3.06s  0.77%  bitmap_list_find_element (inline)
     1.65s  0.42% 96.20%      3.20s  0.81%  gimple_vuse (inline)
     0.41s   0.1% 96.30%      4.32s  1.09%  insert_updated_phi_nodes_for
     0.16s  0.04% 96.34%    384.15s 96.90%  maybe_skip_until (inline)

Top offenders include "maybe_skip_until" and "get_continuation_for_phi".

Time report:

Time variable                                   usr           sys          wall
              GGC
 phase parsing                      :   0.15 (  0%)   0.01 (  3%)   0.15 (  0%)
   7482 kB ( 12%)
 phase opt and generate             : 191.73 (100%)   0.37 ( 97%) 192.11 (100%)
  52645 kB ( 87%)
 phase last asm                     :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    116 kB (  0%)
 garbage collection                 :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
      0 kB (  0%)
 callgraph construction             :   0.01 (  0%)   0.01 (  3%)   0.01 (  0%)
   6204 kB ( 10%)
 callgraph optimization             :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 ipa function summary               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      1 kB (  0%)
 ipa dead code removal              :   0.01 (  0%)   0.01 (  3%)   0.02 (  0%)
      0 kB (  0%)
 ipa profile                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 cfg construction                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    412 kB (  1%)
 cfg cleanup                        :   0.01 (  0%)   0.01 (  3%)   0.02 (  0%)
     56 kB (  0%)
 CFG verifier                       :   0.22 (  0%)   0.01 (  3%)   0.26 (  0%)
      0 kB (  0%)
 trivially dead code                :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 df scan insns                      :   0.02 (  0%)   0.01 (  3%)   0.02 (  0%)
      0 kB (  0%)
 df multiple defs                   :   0.25 (  0%)   0.00 (  0%)   0.25 (  0%)
      0 kB (  0%)
 df reaching defs                   :   0.09 (  0%)   0.01 (  3%)   0.08 (  0%)
      0 kB (  0%)
 df live regs                       :   0.12 (  0%)   0.00 (  0%)   0.13 (  0%)
      0 kB (  0%)
 df live&initialized regs           :   0.06 (  0%)   0.00 (  0%)   0.07 (  0%)
      0 kB (  0%)
 df must-initialized regs           :   0.03 (  0%)   0.01 (  3%)   0.03 (  0%)
      0 kB (  0%)
 df use-def / def-use chains        :   0.19 (  0%)   0.04 ( 11%)   0.25 (  0%)
      0 kB (  0%)
 df reg dead/unused notes           :   0.07 (  0%)   0.00 (  0%)   0.09 (  0%)
    530 kB (  1%)
 register information               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 alias analysis                     :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
   1408 kB (  2%)
 alias stmt walking                 : 184.44 ( 96%)   0.07 ( 18%) 184.51 ( 96%)
     37 kB (  0%)
 rebuild jump labels                :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 parser (global)                    :   0.15 (  0%)   0.00 (  0%)   0.15 (  0%)
   7182 kB ( 12%)
 inline parameters                  :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
      1 kB (  0%)
 tree gimplify                      :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
   5445 kB (  9%)
 tree eh                            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    525 kB (  1%)
 tree CFG construction              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
   2257 kB (  4%)
 tree CFG cleanup                   :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
    127 kB (  0%)
 tree tail merge                    :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    543 kB (  1%)
 tree copy propagation              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 tree PTA                           :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
     63 kB (  0%)
 tree SSA rewrite                   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   1102 kB (  2%)
 tree SSA other                     :   0.00 (  0%)   0.01 (  3%)   0.02 (  0%)
      0 kB (  0%)
 tree SSA incremental               :   2.43 (  1%)   0.00 (  0%)   2.42 (  1%)
   2156 kB (  4%)
 tree operand scan                  :   0.02 (  0%)   0.02 (  5%)   0.04 (  0%)
   1141 kB (  2%)
 dominator optimization             :   0.07 (  0%)   0.04 ( 11%)   0.11 (  0%)
   4046 kB (  7%)
 backwards jump threading           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 tree CCP                           :   0.03 (  0%)   0.01 (  3%)   0.05 (  0%)
    585 kB (  1%)
 tree PRE                           :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)
   1410 kB (  2%)
 tree FRE                           :   0.03 (  0%)   0.02 (  5%)   0.06 (  0%)
    752 kB (  1%)
 tree linearize phis                :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      1 kB (  0%)
 tree backward propagate            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 tree forward propagate             :   0.03 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 tree conservative DCE              :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
    128 kB (  0%)
 tree aggressive DCE                :   0.04 (  0%)   0.01 (  3%)   0.05 (  0%)
      2 kB (  0%)
 tree DSE                           :   0.14 (  0%)   0.00 (  0%)   0.15 (  0%)
      0 kB (  0%)
 PHI merge                          :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 tree SSA uncprop                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 tree SSA verifier                  :   0.30 (  0%)   0.01 (  3%)   0.30 (  0%)
      0 kB (  0%)
 tree STMT verifier                 :   0.38 (  0%)   0.03 (  8%)   0.36 (  0%)
      0 kB (  0%)
 tree strlen optimization           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 callgraph verifier                 :   0.02 (  0%)   0.01 (  3%)   0.01 (  0%)
      0 kB (  0%)
 dominance frontiers                :   0.24 (  0%)   0.00 (  0%)   0.26 (  0%)
      0 kB (  0%)
 dominance computation              :   0.13 (  0%)   0.00 (  0%)   0.12 (  0%)
      0 kB (  0%)
 control dependences                :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
      0 kB (  0%)
 out of ssa                         :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
     63 kB (  0%)
 expand vars                        :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    788 kB (  1%)
 expand                             :   0.01 (  0%)   0.01 (  3%)   0.03 (  0%)
   5107 kB (  8%)
 post expand cleanups               :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 varconst                           :   0.00 (  0%)   0.01 (  3%)   0.00 (  0%)
    300 kB (  0%)
 forward prop                       :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
    281 kB (  0%)
 CSE                                :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
    113 kB (  0%)
 dead code elimination              :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
      0 kB (  0%)
 dead store elim1                   :   0.10 (  0%)   0.00 (  0%)   0.10 (  0%)
     57 kB (  0%)
 dead store elim2                   :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
     58 kB (  0%)
 loop init                          :   0.06 (  0%)   0.00 (  0%)   0.08 (  0%)
      2 kB (  0%)
 loop invariant motion              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 CPROP                              :   0.36 (  0%)   0.01 (  3%)   0.38 (  0%)
    169 kB (  0%)
 PRE                                :   0.31 (  0%)   0.00 (  0%)   0.32 (  0%)
   1518 kB (  3%)
 CSE 2                              :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
    309 kB (  1%)
 branch prediction                  :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 combiner                           :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)
     56 kB (  0%)
 integrated RA                      :   0.12 (  0%)   0.00 (  0%)   0.11 (  0%)
   3073 kB (  5%)
 LRA non-specific                   :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
      0 kB (  0%)
 LRA create live ranges             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 LRA hard reg assignment            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 reload CSE regs                    :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
    677 kB (  1%)
 ree                                :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 thread pro- & epilogue             :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      3 kB (  0%)
 peephole 2                         :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    525 kB (  1%)
 hard reg cprop                     :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 scheduling 2                       :   0.09 (  0%)   0.01 (  3%)   0.09 (  0%)
    284 kB (  0%)
 reorder blocks                     :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
    770 kB (  1%)
 shorten branches                   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 final                              :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)
   3229 kB (  5%)
 variable output                    :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    571 kB (  1%)
 symout                             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   1732 kB (  3%)
 variable tracking                  :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)
   1862 kB (  3%)
 var-tracking dataflow              :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 var-tracking emit                  :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
     56 kB (  0%)
 rest of compilation                :   0.05 (  0%)   0.00 (  0%)   0.07 (  0%)
    378 kB (  1%)
 verify RTL sharing                 :   0.15 (  0%)   0.00 (  0%)   0.18 (  0%)
      0 kB (  0%)
 repair loop structures             :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)
      0 kB (  0%)
 TOTAL                              : 191.89          0.38        192.27       
  60410 kB

Reply via email to