On Thu, 18 Oct 2018, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > PR63155 made me pick up this old work from Steven, it turns our > > linked-list implementation to a two-mode one with one being a > > splay tree featuring O(log N) complexity for find/remove. > > > > Over Stevens original patch I added a bitmap_tree_to_vec helper > > that I use from the debug/print methods to avoid changing view > > there. In theory the bitmap iterator could get a "stack" > > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP. > > > > This can be used to fix the two biggest bottlenecks in the PRs > > testcase, namely SSA propagator worklist handling and out-of-SSA > > coalesce list building. perf shows the following data, first > > unpatched, second patched - also watch the thrid coulumn (samples) > > when comparing percentages. > > > > -O0 > > - 18.19% 17.35% 407 cc1 cc1 [.] > > bitmap_set_b▒ > > - bitmap_set_bit > > ▒ > > + 8.77% create_coalesce_list_for_region > > ▒ > > + 4.21% calculate_live_ranges > > ▒ > > + 2.02% build_ssa_conflict_graph > > ▒ > > + 1.66% insert_phi_nodes_for > > ▒ > > + 0.86% coalesce_ssa_name > > patched: > > - 12.39% 10.48% 129 cc1 cc1 [.] > > bitmap_set_b▒ > > - bitmap_set_bit > > ▒ > > + 5.27% calculate_live_ranges > > ▒ > > + 2.76% insert_phi_nodes_for > > ▒ > > + 1.90% create_coalesce_list_for_region > > ▒ > > + 1.63% build_ssa_conflict_graph > > ▒ > > + 0.35% coalesce_ssa_name > > > > -O1 > > - 17.53% 17.53% 842 cc1 cc1 [.] > > bitmap_set_b▒ > > - bitmap_set_bit > > ▒ > > + 12.39% add_ssa_edge > > ▒ > > + 1.48% create_coalesce_list_for_region > > ▒ > > + 0.82% solve_constraints > > ▒ > > + 0.71% calculate_live_ranges > > ▒ > > + 0.64% add_implicit_graph_edge > > ▒ > > + 0.41% insert_phi_nodes_for > > ▒ > > + 0.34% build_ssa_conflict_graph > > patched: > > - 5.79% 5.00% 167 cc1 cc1 [.] > > bitmap_set_b▒ > > - bitmap_set_bit > > ▒ > > + 1.41% add_ssa_edge > > ▒ > > + 0.88% calculate_live_ranges > > ▒ > > + 0.75% add_implicit_graph_edge > > ▒ > > + 0.68% solve_constraints > > ▒ > > + 0.48% insert_phi_nodes_for > > ▒ > > + 0.45% build_ssa_conflict_graph > > > > -O3 > > - 12.37% 12.34% 1145 cc1 cc1 [.] > > bitmap_set_b▒ > > - bitmap_set_bit > > ▒ > > + 9.14% add_ssa_edge > > ▒ > > + 0.80% create_coalesce_list_for_region > > ▒ > > + 0.69% add_implicit_graph_edge > > ▒ > > + 0.54% solve_constraints > > ▒ > > + 0.34% calculate_live_ranges > > ▒ > > + 0.27% insert_phi_nodes_for > > ▒ > > + 0.21% build_ssa_conflict_graph > > - 4.36% 3.86% 227 cc1 cc1 [.] > > bitmap_set_b▒ > > - bitmap_set_bit > > ▒ > > + 0.98% add_ssa_edge > > ▒ > > + 0.86% add_implicit_graph_edge > > ▒ > > + 0.64% solve_constraints > > ▒ > > + 0.57% calculate_live_ranges > > ▒ > > + 0.32% build_ssa_conflict_graph > > ▒ > > + 0.29% mark_all_vars_used_1 > > ▒ > > + 0.20% insert_phi_nodes_for > > ▒ > > + 0.16% create_coalesce_list_for_region > > > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > What's the performance like for more reasonable testcases? E.g. is there > a measurable change either way in --enable-checking=release for some gcc > .iis at -g or -O2 -g?
I did a quick check on my set of cc1files (still .i, .ii ones tend to be unusable quite quickly...). Most of them compile too quickly to make any difference appear other than noise. Multi-second differences like for PR63155 should be the exception but our O(n) bitmap implementation really makes some parts of GCC quadratic where it doesn't appear so. Is there a reason you expect it to be ever slower? Differences unpatched vs. patched for -O0 -g (first file then time): /suse/rguenther/src/cc1files/alias.i -0.36user +0.34user /suse/rguenther/src/cc1files/alloc-pool.i 0.04user /suse/rguenther/src/cc1files/attribs.i -0.16user +0.08user /suse/rguenther/src/cc1files/auto-inc-dec.i -0.10user +0.09user /suse/rguenther/src/cc1files/bb-reorder.i -0.18user +0.14user /suse/rguenther/src/cc1files/bitmap.i -0.13user +0.09user /suse/rguenther/src/cc1files/bt-load.i -0.26user +0.24user /suse/rguenther/src/cc1files/builtins.i -0.82user +0.69user ... skipping similar changes ... /suse/rguenther/src/cc1files/cfg.i -0.19user +0.21user (so there are cases of larger time) /suse/rguenther/src/cc1files/insn-attrtab.i -3.61user +3.48user /suse/rguenther/src/cc1files/insn-automata.i -0.51user +0.58user /suse/rguenther/src/cc1files/insn-emit.i 2.46user /suse/rguenther/src/cc1files/insn-opinit.i -0.19user +0.30user /suse/rguenther/src/cc1files/insn-output.i -0.56user +0.69user and at -O2 -g /suse/rguenther/src/cc1files/alias.i -0.87user +0.81user /suse/rguenther/src/cc1files/alloc-pool.i -0.08user +0.09user /suse/rguenther/src/cc1files/attribs.i -0.32user +0.33user /suse/rguenther/src/cc1files/auto-inc-dec.i -0.17user +0.24user /suse/rguenther/src/cc1files/bb-reorder.i -0.77user +0.61user /suse/rguenther/src/cc1files/bitmap.i -0.47user +0.53user /suse/rguenther/src/cc1files/bt-load.i 0.55user ... /suse/rguenther/src/cc1files/dwarf2out.i -5.31user +5.30user ... I'd say any difference is fake news. Richard.