On Wed, Mar 27, 2024 at 07:44:28PM +0100, Michael Matz wrote: > Hey, > > On Wed, 27 Mar 2024, Jakub Jelinek wrote: > > > > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs) > > > gcc_checking_assert (bb_index > > > < (unsigned) last_basic_block_for_fn (cfun)); > > > > > > - EXECUTE_IF_AND_COMPL_IN_BITMAP (&dfs[bb_index], > > > phi_insertion_points, > > > - 0, i, bi) > > > - { > > > + EXECUTE_IF_SET_IN_BITMAP (&dfs[bb_index], 0, i, bi) > > > + if (bitmap_set_bit (phi_insertion_points, i)) > > > bitmap_set_bit (work_set, i); > > > - bitmap_set_bit (phi_insertion_points, i); > > > - } > > > } > > > > I don't understand why the above is better. > > Wouldn't it be best to do > > bitmap_ior_and_compl_into (work_set, &dfs[bb_index], > > phi_insertion_points); > > bitmap_ior_into (phi_insertion_points, &dfs[bb_index]); > > ? > > I had the same hunch, but: > > 1) One would have to make work_set be non-tree-view again (which with the > current structure is a wash anyway, and that makes sense as accesses to > work_set aren't heavily random here). > > 2) But doing that and using bitmap_ior.._into is still measurably slower: > on a reduced testcase with -O0 -fno-checking, proposed structure > (tree-view or not-tree-view workset doesn't matter): > > tree SSA rewrite : 14.93 ( 12%) 0.01 ( 2%) 14.95 ( > 12%) 27M ( 8%) > > with non-tree-view, and your suggestion: > > tree SSA rewrite : 20.68 ( 12%) 0.02 ( 4%) 20.75 ( > 12%) 27M ( 8%) > > I can only speculate that the usually extreme sparsity of the bitmaps in > question make the setup costs of the two bitmap_ior calls actually more > expensive than the often skipped second call to bitmap_set_bit in Richis > proposed structure. (That or cache effects)
Ok then. Jakub