I think Manolis's patches are all in, time to revisit this one?



On 6/12/24 1:54 AM, Robin Dapp wrote:
Hmm, ok.  The bit that confused me most was:

   if (last_needs_comparison != -1)
     {
       end_sequence ();
       start_sequence ();
       ...
     }

which implied that the second attempt was made conditionally.
It seems like it's always used and is an inherent part of the
algorithm.

If the problem is tracking liveness, wouldn't it be better to
iterate over the "then" block in reverse order?  We would start
with the liveness set for the join block and update as we move
backwards through the "then" block.  This liveness set would
tell us whether the current instruction needs to preserve a
particular register.  That should make it possible to do the
transformation in one step, and so avoid the risk that the
second attempt does something that is unexpectedly different
from the first attempt.

I agree that the current approach is rather cumbersome.  Indeed
the second attempt was conditional at first and I changed it to
be unconditional after some patch iterations.
Your reverse-order idea sounds like it should work.  To further
clean up the algorithm we could also make it more explicit
that a "cmov" depends on either the condition or the CC and
basically track two separate paths through the block, one CC
path and one "condition" path.

I can surely do that as a follow up.  It might conflict with
Manolis's changes, though, so his work should probably be in
first.

FWIW, the reason for asking was that it seemed safer to pass
use_cond_earliest back from noce_convert_multiple_sets_1
to noce_convert_multiple_sets, as another parameter,
and then do the adjustment around noce_convert_multiple_sets's
call to targetm.noce_conversion_profitable_p.  That would avoid
the new for a new if_info field, which in turn would make it
less likely that stale information is carried over from one attempt
to the next (e.g. if other ifcvt techniques end up using the same
field in future).

Would something like the attached v4 be OK that uses a parameter
instead (I mean without having refactored the full algorithm)?
At least I changed the comment before the second attempt to
hopefully cause a tiny bit less confusion :)
I haven't fully bootstrapped it yet.

Regards
  Robin

Before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.

This patch computes the jump costs via
   insn_cost (if_info.jump, ...)
under the assumption that the target takes BRANCH_COST into account
when costing a jump instruction.

In noce_convert_multiple_sets, we keep track of the need for the initial
CC comparison.  If we needed it for the generated sequence we add its
cost before default_noce_conversion_profitable_p.

gcc/ChangeLog:

        * ifcvt.cc (noce_convert_multiple_sets):  Define
        use_cond_earliest and adjust original cost if needed.
        (noce_convert_multiple_sets_1): Add param use_cond_earliest.
        (noce_process_if_block): Do not subtract CC cost anymore.
        (noce_find_if_block): Use insn_cost for costing jump insn.
---
  gcc/ifcvt.cc | 79 +++++++++++++++++++++++++++++-----------------------
  1 file changed, 44 insertions(+), 35 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..2854eea7702 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct 
noce_if_info *,
                                          hash_map<rtx_insn *, int> *,
                                          auto_vec<rtx> *,
                                          auto_vec<rtx> *,
-                                         auto_vec<rtx_insn *> *, int *);
+                                         auto_vec<rtx_insn *> *,
+                                         int *, bool *);
  
  /* Count the number of non-jump active insns in BB.  */
@@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) int last_needs_comparison = -1; + bool use_cond_earliest = false;
+
    bool ok = noce_convert_multiple_sets_1
      (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-     &unmodified_insns, &last_needs_comparison);
+     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
    if (!ok)
        return false;
- /* If there are insns that overwrite part of the initial
-     comparison, we can still omit creating temporaries for
-     the last of them.
-     As the second try will always create a less expensive,
-     valid sequence, we do not need to compare and can discard
-     the first one.  */
-  if (last_needs_comparison != -1)
-    {
-      end_sequence ();
-      start_sequence ();
-      ok = noce_convert_multiple_sets_1
-       (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-        &unmodified_insns, &last_needs_comparison);
-      /* Actually we should not fail anymore if we reached here,
-        but better still check.  */
-      if (!ok)
-         return false;
-    }
+  /* Always perform a second attempt that uses information gathered in the
+     first.  At least we can omit creating temporaries until we definitely
+     need them.  The sequence created in the second attempt is never worse
+     than the first.  */
+
+  end_sequence ();
+  start_sequence ();
+  ok = noce_convert_multiple_sets_1
+    (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
+     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
+  /* Actually we should not fail anymore if we reached here,
+     but better still check.  */
+  if (!ok)
+    return false;
/* We must have seen some sort of insn to insert, otherwise we were
       given an empty BB to convert, and we can't handle that.  */
@@ -3539,12 +3538,23 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
    /* Actually emit the sequence if it isn't too expensive.  */
    rtx_insn *seq = get_insns ();
+ /* If the created sequence does not use cond_earliest (but the jump
+     does) add its cost to the original_cost before comparing costs.  */
+  unsigned int original_cost = if_info->original_cost;
+  if (if_info->jump != if_info->cond_earliest && !use_cond_earliest)
+    if_info->original_cost += insn_cost (if_info->cond_earliest,
+                                        if_info->speed_p);
+
    if (!targetm.noce_conversion_profitable_p (seq, if_info))
      {
+      if_info->original_cost = original_cost;
        end_sequence ();
        return false;
      }
+ /* Restore the original cost in case we do not succeed below. */
+  if_info->original_cost = original_cost;
+
    for (insn = seq; insn; insn = NEXT_INSN (insn))
      set_used_flags (insn);
@@ -3620,7 +3630,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
                              auto_vec<rtx> *targets,
                              auto_vec<rtx> *temporaries,
                              auto_vec<rtx_insn *> *unmodified_insns,
-                             int *last_needs_comparison)
+                             int *last_needs_comparison,
+                             bool *use_cond_earliest)
  {
    basic_block then_bb = if_info->then_bb;
    rtx_insn *jump = if_info->jump;
@@ -3644,6 +3655,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
    unmodified_insns->truncate (0);
bool second_try = *last_needs_comparison != -1;
+  *use_cond_earliest = false;
FOR_BB_INSNS (then_bb, insn)
      {
@@ -3780,6 +3792,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
          temp_dest = temp_dest2;
          if (!second_try && read_comparison)
            *last_needs_comparison = count;
+         *use_cond_earliest = true;
        }
        else
        {
@@ -3931,16 +3944,13 @@ noce_process_if_block (struct noce_if_info *if_info)
       to calculate a value for x.
       ??? For future expansion, further expand the "multiple X" rules.  */
- /* First look for multiple SETS. The original costs already include
-     a base cost of COSTS_N_INSNS (2): one instruction for the compare
-     (which we will be needing either way) and one instruction for the
-     branch.  When comparing costs we want to use the branch instruction
-     cost and the sets vs. the cmovs generated here.  Therefore subtract
-     the costs of the compare before checking.
-     ??? Actually, instead of the branch instruction costs we might want
-     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
+  /* First look for multiple SETS.
+     The original costs already include costs for the jump insn as well
+     as for a CC comparison if there is any.
+     If a target re-uses the existing CC comparison we keep track of that
+     and add the costs before default noce_conversion_profitable_p.  */
- unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned potential_cost = if_info->original_cost;
    unsigned old_cost = if_info->original_cost;
    if (!else_bb
        && HAVE_conditional_move
@@ -4703,11 +4713,10 @@ noce_find_if_block (basic_block test_bb, edge 
then_edge, edge else_edge,
      = targetm.max_noce_ifcvt_seq_cost (then_edge);
    /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
       that they are valid to transform.  We can't easily get back to the insn
-     for COND (and it may not exist if we had to canonicalize to get COND),
-     and jump_insns are always given a cost of 1 by seq_cost, so treat
-     both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+     for COND (and it may not exist if we had to canonicalize to get COND).
+     It is assumed that the costs of a jump insn are dependent on the
+     branch costs.  */
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
/* Do the real work. */

Reply via email to