Re: [PATCH] ifcvt: Clarify if_info.original_cost.

Robin Dapp Wed, 12 Jun 2024 00:54:38 -0700

> Hmm, ok.  The bit that confused me most was:
> 
>   if (last_needs_comparison != -1)
>     {
>       end_sequence ();
>       start_sequence ();
>       ...
>     }
> 
> which implied that the second attempt was made conditionally.
> It seems like it's always used and is an inherent part of the
> algorithm.
> 
> If the problem is tracking liveness, wouldn't it be better to
> iterate over the "then" block in reverse order?  We would start
> with the liveness set for the join block and update as we move
> backwards through the "then" block.  This liveness set would
> tell us whether the current instruction needs to preserve a
> particular register.  That should make it possible to do the
> transformation in one step, and so avoid the risk that the
> second attempt does something that is unexpectedly different
> from the first attempt.


I agree that the current approach is rather cumbersome.  Indeed
the second attempt was conditional at first and I changed it to
be unconditional after some patch iterations.
Your reverse-order idea sounds like it should work.  To further
clean up the algorithm we could also make it more explicit
that a "cmov" depends on either the condition or the CC and
basically track two separate paths through the block, one CC
path and one "condition" path.

I can surely do that as a follow up.  It might conflict with
Manolis's changes, though, so his work should probably be in
first.

> FWIW, the reason for asking was that it seemed safer to pass
> use_cond_earliest back from noce_convert_multiple_sets_1
> to noce_convert_multiple_sets, as another parameter,
> and then do the adjustment around noce_convert_multiple_sets's
> call to targetm.noce_conversion_profitable_p.  That would avoid
> the new for a new if_info field, which in turn would make it
> less likely that stale information is carried over from one attempt
> to the next (e.g. if other ifcvt techniques end up using the same
> field in future).

Would something like the attached v4 be OK that uses a parameter
instead (I mean without having refactored the full algorithm)?
At least I changed the comment before the second attempt to
hopefully cause a tiny bit less confusion :)
I haven't fully bootstrapped it yet.

Regards
 Robin

Before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.

This patch computes the jump costs via
  insn_cost (if_info.jump, ...)
under the assumption that the target takes BRANCH_COST into account
when costing a jump instruction.

In noce_convert_multiple_sets, we keep track of the need for the initial
CC comparison.  If we needed it for the generated sequence we add its
cost before default_noce_conversion_profitable_p.

gcc/ChangeLog:

        * ifcvt.cc (noce_convert_multiple_sets):  Define
        use_cond_earliest and adjust original cost if needed.
        (noce_convert_multiple_sets_1): Add param use_cond_earliest.
        (noce_process_if_block): Do not subtract CC cost anymore.
        (noce_find_if_block): Use insn_cost for costing jump insn.
---
 gcc/ifcvt.cc | 79 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 35 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..2854eea7702 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct 
noce_if_info *,
                                          hash_map<rtx_insn *, int> *,
                                          auto_vec<rtx> *,
                                          auto_vec<rtx> *,
-                                         auto_vec<rtx_insn *> *, int *);
+                                         auto_vec<rtx_insn *> *,
+                                         int *, bool *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
 
   int last_needs_comparison = -1;
 
+  bool use_cond_earliest = false;
+
   bool ok = noce_convert_multiple_sets_1
     (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-     &unmodified_insns, &last_needs_comparison);
+     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
   if (!ok)
       return false;
 
-  /* If there are insns that overwrite part of the initial
-     comparison, we can still omit creating temporaries for
-     the last of them.
-     As the second try will always create a less expensive,
-     valid sequence, we do not need to compare and can discard
-     the first one.  */
-  if (last_needs_comparison != -1)
-    {
-      end_sequence ();
-      start_sequence ();
-      ok = noce_convert_multiple_sets_1
-       (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-        &unmodified_insns, &last_needs_comparison);
-      /* Actually we should not fail anymore if we reached here,
-        but better still check.  */
-      if (!ok)
-         return false;
-    }
+  /* Always perform a second attempt that uses information gathered in the
+     first.  At least we can omit creating temporaries until we definitely
+     need them.  The sequence created in the second attempt is never worse
+     than the first.  */
+
+  end_sequence ();
+  start_sequence ();
+  ok = noce_convert_multiple_sets_1
+    (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
+     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
+  /* Actually we should not fail anymore if we reached here,
+     but better still check.  */
+  if (!ok)
+    return false;
 
   /* We must have seen some sort of insn to insert, otherwise we were
      given an empty BB to convert, and we can't handle that.  */
@@ -3539,12 +3538,23 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
 
+  /* If the created sequence does not use cond_earliest (but the jump
+     does) add its cost to the original_cost before comparing costs.  */
+  unsigned int original_cost = if_info->original_cost;
+  if (if_info->jump != if_info->cond_earliest && !use_cond_earliest)
+    if_info->original_cost += insn_cost (if_info->cond_earliest,
+                                        if_info->speed_p);
+
   if (!targetm.noce_conversion_profitable_p (seq, if_info))
     {
+      if_info->original_cost = original_cost;
       end_sequence ();
       return false;
     }
 
+  /* Restore the original cost in case we do not succeed below.  */
+  if_info->original_cost = original_cost;
+
   for (insn = seq; insn; insn = NEXT_INSN (insn))
     set_used_flags (insn);
 
@@ -3620,7 +3630,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
                              auto_vec<rtx> *targets,
                              auto_vec<rtx> *temporaries,
                              auto_vec<rtx_insn *> *unmodified_insns,
-                             int *last_needs_comparison)
+                             int *last_needs_comparison,
+                             bool *use_cond_earliest)
 {
   basic_block then_bb = if_info->then_bb;
   rtx_insn *jump = if_info->jump;
@@ -3644,6 +3655,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
   unmodified_insns->truncate (0);
 
   bool second_try = *last_needs_comparison != -1;
+  *use_cond_earliest = false;
 
   FOR_BB_INSNS (then_bb, insn)
     {
@@ -3780,6 +3792,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
          temp_dest = temp_dest2;
          if (!second_try && read_comparison)
            *last_needs_comparison = count;
+         *use_cond_earliest = true;
        }
       else
        {
@@ -3931,16 +3944,13 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  The original costs already include
-     a base cost of COSTS_N_INSNS (2): one instruction for the compare
-     (which we will be needing either way) and one instruction for the
-     branch.  When comparing costs we want to use the branch instruction
-     cost and the sets vs. the cmovs generated here.  Therefore subtract
-     the costs of the compare before checking.
-     ??? Actually, instead of the branch instruction costs we might want
-     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
+  /* First look for multiple SETS.
+     The original costs already include costs for the jump insn as well
+     as for a CC comparison if there is any.
+     If a target re-uses the existing CC comparison we keep track of that
+     and add the costs before default noce_conversion_profitable_p.  */
 
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned potential_cost = if_info->original_cost;
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
@@ -4703,11 +4713,10 @@ noce_find_if_block (basic_block test_bb, edge 
then_edge, edge else_edge,
     = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
      that they are valid to transform.  We can't easily get back to the insn
-     for COND (and it may not exist if we had to canonicalize to get COND),
-     and jump_insns are always given a cost of 1 by seq_cost, so treat
-     both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+     for COND (and it may not exist if we had to canonicalize to get COND).
+     It is assumed that the costs of a jump insn are dependent on the
+     branch costs.  */
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */
 
-- 
2.45.1

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

Reply via email to