On Thu, 18 Oct 2018, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > PR63155 made me pick up this old work from Steven, it turns our
> > linked-list implementation to a two-mode one with one being a
> > splay tree featuring O(log N) complexity for find/remove.
> >
> > Over Stevens original patch I added a bitmap_tree_to_vec helper
> > that I use from the debug/print methods to avoid changing view
> > there.  In theory the bitmap iterator could get a "stack"
> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> >
> > This can be used to fix the two biggest bottlenecks in the PRs
> > testcase, namely SSA propagator worklist handling and out-of-SSA
> > coalesce list building.  perf shows the following data, first
> > unpatched, second patched - also watch the thrid coulumn (samples)
> > when comparing percentages.
> >
> > -O0
> > -   18.19%    17.35%           407  cc1      cc1               [.] 
> > bitmap_set_b▒
> >    - bitmap_set_bit                                                         
> >    ▒
> >       + 8.77% create_coalesce_list_for_region                               
> >    ▒
> >       + 4.21% calculate_live_ranges                                         
> >    ▒
> >       + 2.02% build_ssa_conflict_graph                                      
> >    ▒
> >       + 1.66% insert_phi_nodes_for                                          
> >    ▒
> >       + 0.86% coalesce_ssa_name                      
> > patched:
> > -   12.39%    10.48%           129  cc1      cc1               [.] 
> > bitmap_set_b▒
> >    - bitmap_set_bit                                                         
> >    ▒
> >       + 5.27% calculate_live_ranges                                         
> >    ▒
> >       + 2.76% insert_phi_nodes_for                                          
> >    ▒
> >       + 1.90% create_coalesce_list_for_region                               
> >    ▒
> >       + 1.63% build_ssa_conflict_graph                                      
> >    ▒
> >       + 0.35% coalesce_ssa_name                           
> >
> > -O1
> > -   17.53%    17.53%           842  cc1      cc1               [.] 
> > bitmap_set_b▒
> >    - bitmap_set_bit                                                         
> >    ▒
> >       + 12.39% add_ssa_edge                                                 
> >    ▒
> >       + 1.48% create_coalesce_list_for_region                               
> >    ▒
> >       + 0.82% solve_constraints                                             
> >    ▒
> >       + 0.71% calculate_live_ranges                                         
> >    ▒
> >       + 0.64% add_implicit_graph_edge                                       
> >    ▒
> >       + 0.41% insert_phi_nodes_for                                          
> >    ▒
> >       + 0.34% build_ssa_conflict_graph                      
> > patched:
> > -    5.79%     5.00%           167  cc1      cc1               [.] 
> > bitmap_set_b▒
> >    - bitmap_set_bit                                                         
> >    ▒
> >       + 1.41% add_ssa_edge                                                  
> >    ▒
> >       + 0.88% calculate_live_ranges                                         
> >    ▒
> >       + 0.75% add_implicit_graph_edge                                       
> >    ▒
> >       + 0.68% solve_constraints                                             
> >    ▒
> >       + 0.48% insert_phi_nodes_for                                          
> >    ▒
> >       + 0.45% build_ssa_conflict_graph                   
> >
> > -O3
> > -   12.37%    12.34%          1145  cc1      cc1               [.] 
> > bitmap_set_b▒
> >    - bitmap_set_bit                                                         
> >    ▒
> >       + 9.14% add_ssa_edge                                                  
> >    ▒
> >       + 0.80% create_coalesce_list_for_region                               
> >    ▒
> >       + 0.69% add_implicit_graph_edge                                       
> >    ▒
> >       + 0.54% solve_constraints                                             
> >    ▒
> >       + 0.34% calculate_live_ranges                                         
> >    ▒
> >       + 0.27% insert_phi_nodes_for                                          
> >    ▒
> >       + 0.21% build_ssa_conflict_graph                 
> > -    4.36%     3.86%           227  cc1      cc1               [.] 
> > bitmap_set_b▒
> >    - bitmap_set_bit                                                         
> >    ▒
> >       + 0.98% add_ssa_edge                                                  
> >    ▒
> >       + 0.86% add_implicit_graph_edge                                       
> >    ▒
> >       + 0.64% solve_constraints                                             
> >    ▒
> >       + 0.57% calculate_live_ranges                                         
> >    ▒
> >       + 0.32% build_ssa_conflict_graph                                      
> >    ▒
> >       + 0.29% mark_all_vars_used_1                                          
> >    ▒
> >       + 0.20% insert_phi_nodes_for                                          
> >    ▒
> >       + 0.16% create_coalesce_list_for_region             
> >
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> What's the performance like for more reasonable testcases?  E.g. is there
> a measurable change either way in --enable-checking=release for some gcc
> .iis at -g or -O2 -g?

I did a quick check on my set of cc1files (still .i, .ii ones tend to
be unusable quite quickly...).  Most of them compile too quickly
to make any difference appear other than noise.  Multi-second differences
like for PR63155 should be the exception but our O(n) bitmap
implementation really makes some parts of GCC quadratic where it
doesn't appear so.

Is there a reason you expect it to be ever slower?

Differences unpatched vs. patched for -O0 -g (first file then time):

 /suse/rguenther/src/cc1files/alias.i
-0.36user
+0.34user
 /suse/rguenther/src/cc1files/alloc-pool.i
 0.04user
 /suse/rguenther/src/cc1files/attribs.i
-0.16user
+0.08user
 /suse/rguenther/src/cc1files/auto-inc-dec.i
-0.10user
+0.09user
 /suse/rguenther/src/cc1files/bb-reorder.i
-0.18user
+0.14user
 /suse/rguenther/src/cc1files/bitmap.i
-0.13user
+0.09user
 /suse/rguenther/src/cc1files/bt-load.i
-0.26user
+0.24user
 /suse/rguenther/src/cc1files/builtins.i
-0.82user
+0.69user
... skipping similar changes ...
 /suse/rguenther/src/cc1files/cfg.i
-0.19user
+0.21user
(so there are cases of larger time)
 /suse/rguenther/src/cc1files/insn-attrtab.i
-3.61user
+3.48user
 /suse/rguenther/src/cc1files/insn-automata.i
-0.51user
+0.58user
 /suse/rguenther/src/cc1files/insn-emit.i
 2.46user
 /suse/rguenther/src/cc1files/insn-opinit.i
-0.19user
+0.30user
 /suse/rguenther/src/cc1files/insn-output.i
-0.56user
+0.69user

and at -O2 -g

 /suse/rguenther/src/cc1files/alias.i
-0.87user
+0.81user
 /suse/rguenther/src/cc1files/alloc-pool.i
-0.08user
+0.09user
 /suse/rguenther/src/cc1files/attribs.i
-0.32user
+0.33user
 /suse/rguenther/src/cc1files/auto-inc-dec.i
-0.17user
+0.24user
 /suse/rguenther/src/cc1files/bb-reorder.i
-0.77user
+0.61user
 /suse/rguenther/src/cc1files/bitmap.i
-0.47user
+0.53user
 /suse/rguenther/src/cc1files/bt-load.i
 0.55user
...
 /suse/rguenther/src/cc1files/dwarf2out.i
-5.31user
+5.30user
...

I'd say any difference is fake news.

Richard.

Reply via email to