Richard Biener <rguent...@suse.de> writes: > PR63155 made me pick up this old work from Steven, it turns our > linked-list implementation to a two-mode one with one being a > splay tree featuring O(log N) complexity for find/remove. > > Over Stevens original patch I added a bitmap_tree_to_vec helper > that I use from the debug/print methods to avoid changing view > there. In theory the bitmap iterator could get a "stack" > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP. > > This can be used to fix the two biggest bottlenecks in the PRs > testcase, namely SSA propagator worklist handling and out-of-SSA > coalesce list building. perf shows the following data, first > unpatched, second patched - also watch the thrid coulumn (samples) > when comparing percentages. > > -O0 > - 18.19% 17.35% 407 cc1 cc1 [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 8.77% create_coalesce_list_for_region > ▒ > + 4.21% calculate_live_ranges > ▒ > + 2.02% build_ssa_conflict_graph > ▒ > + 1.66% insert_phi_nodes_for > ▒ > + 0.86% coalesce_ssa_name > patched: > - 12.39% 10.48% 129 cc1 cc1 [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 5.27% calculate_live_ranges > ▒ > + 2.76% insert_phi_nodes_for > ▒ > + 1.90% create_coalesce_list_for_region > ▒ > + 1.63% build_ssa_conflict_graph > ▒ > + 0.35% coalesce_ssa_name > > -O1 > - 17.53% 17.53% 842 cc1 cc1 [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 12.39% add_ssa_edge > ▒ > + 1.48% create_coalesce_list_for_region > ▒ > + 0.82% solve_constraints > ▒ > + 0.71% calculate_live_ranges > ▒ > + 0.64% add_implicit_graph_edge > ▒ > + 0.41% insert_phi_nodes_for > ▒ > + 0.34% build_ssa_conflict_graph > patched: > - 5.79% 5.00% 167 cc1 cc1 [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 1.41% add_ssa_edge > ▒ > + 0.88% calculate_live_ranges > ▒ > + 0.75% add_implicit_graph_edge > ▒ > + 0.68% solve_constraints > ▒ > + 0.48% insert_phi_nodes_for > ▒ > + 0.45% build_ssa_conflict_graph > > -O3 > - 12.37% 12.34% 1145 cc1 cc1 [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 9.14% add_ssa_edge > ▒ > + 0.80% create_coalesce_list_for_region > ▒ > + 0.69% add_implicit_graph_edge > ▒ > + 0.54% solve_constraints > ▒ > + 0.34% calculate_live_ranges > ▒ > + 0.27% insert_phi_nodes_for > ▒ > + 0.21% build_ssa_conflict_graph > - 4.36% 3.86% 227 cc1 cc1 [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 0.98% add_ssa_edge > ▒ > + 0.86% add_implicit_graph_edge > ▒ > + 0.64% solve_constraints > ▒ > + 0.57% calculate_live_ranges > ▒ > + 0.32% build_ssa_conflict_graph > ▒ > + 0.29% mark_all_vars_used_1 > ▒ > + 0.20% insert_phi_nodes_for > ▒ > + 0.16% create_coalesce_list_for_region > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
What's the performance like for more reasonable testcases? E.g. is there a measurable change either way in --enable-checking=release for some gcc .iis at -g or -O2 -g? Thanks, Richard