Hello Richard:
On 03/11/23 7:06 pm, Richard Biener wrote: > On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagar...@linux.ibm.com> wrote: >> >> Hello Richard: >> >> On 03/11/23 12:51 pm, Richard Biener wrote: >>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote: >>>> >>>> Hello All: >>>> > [...] >>>> >>>> High register pressure region is the region where there are live-in of >>>> early blocks that has been modified by the early block. If there are >>>> modification of the variables in best block that are live-in in early >>>> block that are live-out of best block. >>> >>> ?! Parse error. >>> >> >> I didnt understand what you meant here. Please suggest. > > I can't even guess what that paragraph means. It fails at a > parsing level already, I can't even start to reason about what > the sentences mean. Sorry for that I will modify. > >>>> Bootstrapped and regtested on powerpc64-linux-gnu. >>> >>> What's the effect on code generation? >>> >>> Note that live is a quadratic problem while sinking was not. You >>> are effectively making the pass unfit for -O1. >>> >>> You are computing "liveness" on GIMPLE where within EBBs there >>> isn't really any particular order of stmts, so it's kind of a garbage >>> heuristic. Likewise you are not computing the effect that moving >>> a stmt has on liveness as far as I can see but you are just identifying >>> some odd metrics (I don't really understand them) to rank blocks, >>> not even taking the register file size into account. >> >> >> if the live out of best_bb <= live out of early_bb, that shows >> that there are modification in best_bb. > > Hm? Do you maybe want to say that if live_out (bb) < live_in (bb) > then some variables die during the execution of bb? live_out (bb) < live_in(bb) means in bb there may be KILL (Variables) and there are more GEN (Variables). Otherwise, > if live_out (early) > live_out (best) then somewhere on the path > from early to best some variables die. > If live_out (early) > live_out (best) means there are more GEN (Variables) between path from early to best. >> Then it's >> safer to move statements in best_bb as there are lesser interfering >> live variables in best_bb. > > consider a stmt > > a = b + c; > > where b and c die at the definition of a. Then moving the stmt > down from early_bb means you increase live_out (early_bb) by > one. So why's that "safer" then? Of course live_out (best_bb) > also increases by two then. > If b and c die at the definition of a and generates a live_in(early_bb) would be live_out(early_bb) - 2 + 1. the moving the stmt from early_bb down to best_bb increases live_out(early_bb) by one and live_out (best_bb) depends on the LIVEIN(for all successors of best_bb) which may be same even if we move down. There are chances that live_out (best_bb) greater if for all successors of best_bb there are more GEN ( variables). If live_out (best_bb) is less means there more KILL (Variables) in successors of best_bb. With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do code motion as there are chances of more interfering live ranges. If liveout(best_bb) <= liveout (early_bb) then we do code motion as there is there are more KILL(for all successors of best_bb) and there is less chance of interfering live ranges. With moving down above stmt from early_bb to best_bb increases live_out(early_bb) by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 1 but if it becomes > live_out(best_bb) then we dont do code motion if we have more GEN (Variables) in best_bb otherewise its safer to do code motion. for above statement a = b + c dies b and c and generates a in early_bb then liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10 and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise we do code motion. >> if there are lesser live out in best_bb, there is lesser chance >> of interfering live ranges and hence moving statements in best_bb >> will not increase register pressure. >> >> If the liveout of best_bb is greater than live-out of early_bb, >> moving statements in best_bb will increase chances of more interfering >> live ranges and hence increase in register pressure. >> >> This is how the heuristics is defined. > > I don't think they will work. Do you have any numbers? > My heuristics will work as mentioned above. I will run the spec benchmarks and will able to give performance numbers. Thanks & Regards Ajit >> >>> >>> You are replacing the hot/cold heuristic. >> >>> >>> IMHO the sinking pass is the totally wrong place to do anything >>> about register pressure. You are trying to solve a scheduling >>> problem by just looking at a single stmt. >>> >> >> bb->count from profile.cc are prone to errors as you have >> mentioned in previous mails. Main bottlenecks with code >> motion is increase in register pressure as that counts to >> spills in later phases of the compiler backend. >> >> Calculation of best_bb based of immediate dominator should >> consider register pressure instead of hold cold regions as that >> would effect code generation. >> >> If there is increase in register pressure with code motion and if >> we are moving into colder regions, that wont improve code generations. >> >> Hold/cold should be criteria but not the improved criteria with >> code motion. >> >> We should consider register pressure with code motion than hot/cold >> regions. >> >> Thanks & Regards >> Ajit >> >>> Richard. >>> >>>> Thanks & Regards >> >> >>>> Ajit >>>> >>>> tree-optimization: Add register pressure heuristics >>>> >>>> Currently code sinking heuristics are based on profile data like >>>> basic block count and sink frequency threshold. We have removed >>>> such heuristics to add register pressure heuristics based on >>>> live-in and live-out of early blocks and immediate dominator of >>>> use blocks. >>>> >>>> High register pressure region is the region where there are live-in of >>>> early blocks that has been modified by the early block. If there are >>>> modification of the variables in best block that are live-in in early >>>> block that are live-out of best block. >>>> >>>> 2023-11-03 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >>>> >>>> gcc/ChangeLog: >>>> >>>> * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p >>>> as paramters. >>>> (sink_code_in_bb): Ditto. >>>> (select_best_block): Add register pressure heuristics to select >>>> the best blocks in the immediate dominator for same loop nest >>>> depth. >>>> (execute): Add live range analysis. >>>> (additional_var_map): New function. >>>> * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand >>>> tests on ssa_names. >>>> (verify_live_on_entry): Ditto. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.dg/tree-ssa/ssa-sink-21.c: New test. >>>> * gcc.dg/tree-ssa/ssa-sink-22.c: New test. >>>> --- >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++ >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++ >>>> gcc/tree-ssa-live.cc | 11 ++- >>>> gcc/tree-ssa-sink.cc | 93 ++++++++++++++------- >>>> 4 files changed, 104 insertions(+), 34 deletions(-) >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >>>> >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >>>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >>>> new file mode 100644 >>>> index 00000000000..d3b79ca5803 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >>>> @@ -0,0 +1,15 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ >>>> +void bar(); >>>> +int j; >>>> +void foo(int a, int b, int c, int d, int e, int f) >>>> +{ >>>> + int l; >>>> + l = a + b + c + d +e + f; >>>> + if (a != 5) >>>> + { >>>> + bar(); >>>> + j = l; >>>> + } >>>> +} >>>> +/* { dg-final { scan-tree-dump >>>> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */ >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >>>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >>>> new file mode 100644 >>>> index 00000000000..84e7938c54f >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >>>> @@ -0,0 +1,19 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ >>>> +void bar(); >>>> +int j, x; >>>> +void foo(int a, int b, int c, int d, int e, int f) >>>> +{ >>>> + int l; >>>> + l = a + b + c + d +e + f; >>>> + if (a != 5) >>>> + { >>>> + bar(); >>>> + if (b != 3) >>>> + x = 3; >>>> + else >>>> + x = 5; >>>> + j = l; >>>> + } >>>> +} >>>> +/* { dg-final { scan-tree-dump >>>> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */ >>>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc >>>> index f06daf23035..998fe588278 100644 >>>> --- a/gcc/tree-ssa-live.cc >>>> +++ b/gcc/tree-ssa-live.cc >>>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, >>>> tree_live_info_p live) >>>> def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); >>>> >>>> /* An undefined local variable does not need to be very alive. */ >>>> - if (ssa_undefined_value_p (ssa_name, false)) >>>> + if (virtual_operand_p (ssa_name) >>>> + || ssa_undefined_value_p (ssa_name, false)) >>>> return; >>>> >>>> /* Visit each use of SSA_NAME and if it isn't in the same block as the >>>> def, >>>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr) >>>> >>>> >>>> /* Verify that the info in LIVE matches the current cfg. */ >>>> - >>>> static void >>>> verify_live_on_entry (tree_live_info_p live) >>>> { >>>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live) >>>> tree d = NULL_TREE; >>>> bitmap loe; >>>> var = partition_to_var (map, i); >>>> + if (var == NULL_TREE) >>>> + continue; >>>> stmt = SSA_NAME_DEF_STMT (var); >>>> tmp = gimple_bb (stmt); >>>> + >>>> if (SSA_NAME_VAR (var)) >>>> d = ssa_default_def (cfun, SSA_NAME_VAR (var)); >>>> - >>>> loe = live_on_entry (live, e->dest); >>>> if (loe && bitmap_bit_p (loe, i)) >>>> { >>>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live) >>>> { >>>> /* An undefined local variable does not need to be very >>>> alive. */ >>>> - if (ssa_undefined_value_p (var, false)) >>>> + if (virtual_operand_p (var) >>>> + || ssa_undefined_value_p (var, false)) >>>> continue; >>>> >>>> /* The only way this var shouldn't be marked live on entry >>>> is >>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc >>>> index a360c5cdd6e..d0c9ef1ab86 100644 >>>> --- a/gcc/tree-ssa-sink.cc >>>> +++ b/gcc/tree-ssa-sink.cc >>>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, >>>> bool *debug_stmts) >>>> tree, return the best basic block between them (inclusive) to place >>>> statements. >>>> >>>> + The best basic block should be an immediate dominator of >>>> + best basic block if we've moved to same loop nest. >>>> + >>>> We want the most control dependent block in the shallowest loop nest. >>>> >>>> If the resulting block is in a shallower loop nest, then use it. Else >>>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p >>>> def_p, bool *debug_stmts) >>>> static basic_block >>>> select_best_block (basic_block early_bb, >>>> basic_block late_bb, >>>> - gimple *stmt) >>>> + gimple *stmt, >>>> + tree_live_info_p &live) >>>> { >>>> basic_block best_bb = late_bb; >>>> basic_block temp_bb = late_bb; >>>> - int threshold; >>>> >>>> while (temp_bb != early_bb) >>>> { >>>> /* If we've moved into a lower loop nest, then that becomes >>>> our best block. */ >>>> - if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) >>>> + if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb)) >>>> best_bb = temp_bb; >>>> >>>> /* Walk up the dominator tree, hopefully we'll find a shallower >>>> loop nest. */ >>>> temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); >>>> } >>>> - >>>> /* Placing a statement before a setjmp-like function would be invalid >>>> (it cannot be reevaluated when execution follows an abnormal edge). >>>> If we selected a block with abnormal predecessors, just punt. */ >>>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb, >>>> && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, >>>> best_bb)) >>>> return early_bb; >>>> >>>> - /* Get the sinking threshold. If the statement to be moved has memory >>>> - operands, then increase the threshold by 7% as those are even more >>>> - profitable to avoid, clamping at 100%. */ >>>> - threshold = param_sink_frequency_threshold; >>>> - if (gimple_vuse (stmt) || gimple_vdef (stmt)) >>>> - { >>>> - threshold += 7; >>>> - if (threshold > 100) >>>> - threshold = 100; >>>> - } >>>> + int best_bb_liveout_cnt >>>> + = bitmap_count_bits (&live->liveout[best_bb->index]); >>>> + int early_bb_liveout_cnt >>>> + = bitmap_count_bits (&live->liveout[early_bb->index]); >>>> + int early_livein_cnt >>>> + = bitmap_count_bits (&live->livein[early_bb->index]); >>>> + >>>> + /* High register pressure region is the region where there are live-in >>>> of >>>> + early blocks that has been modified by the early block. If there are >>>> + modification of the variables in best block that are live-in in early >>>> + block that are live-out of best block. */ >>>> + bool live_in_rgn = (early_livein_cnt != 0 >>>> + && early_bb_liveout_cnt <= early_livein_cnt); >>>> + >>>> + bool high_reg_pressure_rgn >>>> + = ((best_bb_liveout_cnt != 0 || live_in_rgn) >>>> + && best_bb_liveout_cnt <= early_bb_liveout_cnt); >>>> >>>> /* If BEST_BB is at the same nesting level, then require it to have >>>> - significantly lower execution frequency to avoid gratuitous >>>> movement. */ >>>> + significantly high register pressure region to avoid gratuitous >>>> + movement. */ >>>> if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb) >>>> - /* If result of comparsion is unknown, prefer EARLY_BB. >>>> - Thus use !(...>=..) rather than (...<...) */ >>>> - && !(best_bb->count * 100 >= early_bb->count * threshold)) >>>> - return best_bb; >>>> + && high_reg_pressure_rgn) >>>> + { >>>> + /* Avoid sinking to immediate dominator if the statement to be moved >>>> + has memory operand and same loop nest. */ >>>> + if (best_bb != late_bb && gimple_vuse (stmt)) >>>> + return late_bb; >>>> + return best_bb; >>>> + } >>>> >>>> /* No better block found, so return EARLY_BB, which happens to be the >>>> statement's original block. */ >>>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb, >>>> static bool >>>> statement_sink_location (gimple *stmt, basic_block frombb, >>>> gimple_stmt_iterator *togsi, bool *zero_uses_p, >>>> - virtual_operand_live &vop_live) >>>> + virtual_operand_live &vop_live, >>>> + tree_live_info_p &live) >>>> { >>>> gimple *use; >>>> use_operand_p one_use = NULL_USE_OPERAND_P; >>>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block >>>> frombb, >>>> if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb)) >>>> return false; >>>> >>>> - commondom = select_best_block (frombb, commondom, stmt); >>>> + commondom = select_best_block (frombb, commondom, stmt, live); >>>> >>>> if (commondom == frombb) >>>> return false; >>>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block >>>> frombb, >>>> continue; >>>> break; >>>> } >>>> + >>>> use = USE_STMT (one_use); >>>> >>>> if (gimple_code (use) != GIMPLE_PHI) >>>> { >>>> - sinkbb = select_best_block (frombb, gimple_bb (use), stmt); >>>> + sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live); >>>> >>>> if (sinkbb == frombb) >>>> return false; >>>> >>>> - if (sinkbb == gimple_bb (use)) >>>> - *togsi = gsi_for_stmt (use); >>>> - else >>>> - *togsi = gsi_after_labels (sinkbb); >>>> + *togsi = gsi_after_labels (sinkbb); >>>> >>>> return true; >>>> } >>>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block >>>> frombb, >>>> if (!sinkbb) >>>> return false; >>>> >>>> - sinkbb = select_best_block (frombb, sinkbb, stmt); >>>> + sinkbb = select_best_block (frombb, sinkbb, stmt, live); >>>> if (!sinkbb || sinkbb == frombb) >>>> return false; >>>> >>>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb) >>>> /* Perform code sinking on BB */ >>>> >>>> static unsigned >>>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live) >>>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live, >>>> + tree_live_info_p &live) >>>> { >>>> gimple_stmt_iterator gsi; >>>> edge_iterator ei; >>>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live >>>> &vop_live) >>>> gimple_stmt_iterator togsi; >>>> bool zero_uses_p; >>>> >>>> - if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, >>>> vop_live)) >>>> + if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, >>>> + vop_live, live)) >>>> { >>>> gimple_stmt_iterator saved = gsi; >>>> if (!gsi_end_p (gsi)) >>>> @@ -814,6 +829,15 @@ private: >>>> bool unsplit_edges; >>>> }; // class pass_sink_code >>>> >>>> +void additional_var_map (var_map &map) >>>> +{ >>>> + map->bmp_bbs = BITMAP_ALLOC (NULL); >>>> + map->outofssa_p = false; >>>> + basic_block bb; >>>> + FOR_EACH_BB_FN (bb, cfun) >>>> + bitmap_set_bit (map->bmp_bbs, bb->index); >>>> +} >>>> + >>>> unsigned int >>>> pass_sink_code::execute (function *fun) >>>> { >>>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun) >>>> calculate_dominance_info (CDI_DOMINATORS); >>>> >>>> virtual_operand_live vop_live; >>>> + var_map map = init_var_map (num_ssa_names); >>>> + additional_var_map (map); >>>> + tree_live_info_p live = calculate_live_ranges (map, true); >>>> >>>> int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); >>>> int n = inverted_rev_post_order_compute (fun, rpo); >>>> + >>>> for (int i = 0; i < n; ++i) >>>> - todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live); >>>> + todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, >>>> live); >>>> free (rpo); >>>> + if (live) >>>> + delete_tree_live_info (live); >>>> + >>>> + if (map) >>>> + delete_var_map (map); >>>> >>>> statistics_counter_event (fun, "Sunk statements", sink_stats.sunk); >>>> statistics_counter_event (fun, "Commoned stores", sink_stats.commoned); >>>> -- >>>> 2.39.3 >>>> >>>>