On 10/31/24 12:29 PM, Andrew Carlotti wrote:
This pass is used to optimise assignments to the FPMR register in
aarch64.  I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.

Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later.  This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.

This patch would currently break any debug instructions that use the
value of fpmr in a region of code where that value is changed by this
pass.  I haven't worked out the best way to fix this, but I suspect the
issue is uncommon and tricky enough that it would be best to just drop
those debug instructions.

I've bootstrapped and regression tested this on aarch64, and it should be NFC
on other targets.  Aside from this, my testing so far has involved hacking in a
single FP8 intrinsic and testing various parameters and control flow
structures, and checking both the codegen and the LCM bitmaps.  I intend to
write better and more comprehensive tests once there are some real intrinsic
implementations available to use.


Is this approach good?  Apart from fixing the debug instructions and
adding tests, is there anything else I need to change?


gcc/ChangeLog:

        * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
        * gcse.cc (doing_hardreg_pre_p): New global variable.
        (current_hardreg_regno): Ditto.
        (compute_local_properties): Unset transp for hardreg clobbers.
        (prune_hardreg_uses): New.
        (want_to_gcse_p): Always return true for hardreg PRE.
        (hash_scan_set): Add checks for hardreg uses/clobbers.
        (oprs_unchanged_p): Disable load motion for hardreg PRE pass.
        (record_last_mem_set_info): Ditto.
        (compute_hash_table_work): Record hardreg uses.
        (prune_expressions): Mark hardreg sets as call-clobbered.
        (compute_pre_data): Add call to prune_hardreg_uses.
        (pre_expr_reaches_here_p_work): Add comment.
        (insert_insn_start_basic_block): New functions.
        (pre_edge_insert): Don't add hardreg sets to predecessor block.
        (pre_delete): Use hardreg for the reaching reg.
        (pre_gcse): Don't insert copies for hardreg PRE.
        (one_pre_gcse_pass): Disable load motion for hardreg PRE pass.
        (execute_hardreg_pre): New.
        (class pass_hardreg_pre): New.
        (pass_hardreg_pre::gate): New.
        (make_pass_hardreg_pre): New.
        * passes.def (pass_hardreg_pre): New pass.
        * tree-pass.h (make_pass_hardreg_pre): New.
So at a 30k foot level, one thing to be very leery of is extending the lifetime of any hard register. It's probably not a big deal on aarch, but it can cause all kinds of headaches on other targets.

Essentially you probably need to avoid PRE on a hard register that's in a likely spilled class.





diff --git a/gcc/gcse.cc b/gcc/gcse.cc
index 
31b92f30fa1ba6c519429d4b7bc55547b2d71c01..ce4ebe420c02d78fcde3144eed595e22212aaa0b
 100644
--- a/gcc/gcse.cc
+++ b/gcc/gcse.cc

@@ -693,10 +698,29 @@ compute_local_properties (sbitmap *transp, sbitmap *comp, 
sbitmap *antloc,
             We start by assuming all are transparent [none are killed], and
             then reset the bits for those that are.  */
          if (transp)
-           compute_transp (expr->expr, indx, transp,
-                           blocks_with_calls,
-                           modify_mem_list_set,
-                           canon_modify_mem_list);
+           {
+             compute_transp (expr->expr, indx, transp,
+                             blocks_with_calls,
+                             modify_mem_list_set,
+                             canon_modify_mem_list);
+
+             if (doing_hardreg_pre_p)
+               {
+                 /* We also need to check whether the destination hardreg is
+                    set or call-clobbered in each BB.  We'll check for hardreg
+                    uses later.  */
+                 df_ref def;
+                 for (def = DF_REG_DEF_CHAIN (current_hardreg_regno);
+                      def;
+                      def = DF_REF_NEXT_REG (def))
+                   bitmap_clear_bit (transp[DF_REF_BB (def)->index], indx);
+
+                 bitmap_iterator bi;
+                 unsigned bb_index;
+                 EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi)
+                   bitmap_clear_bit (transp[bb_index], indx);
+               }
+           }
It's been a long time since I looked at the code, but is there code already in the pass to walk down the FUSAGE notes attached to calls? You'll definitely need that since it can have uses/clobbers of hard regs that are potentially outside the set normally clobbered by calls.




/* The occurrences recorded in antic_occr are exactly those that
             we want to set to nonzero in ANTLOC.  */
@@ -728,6 +752,37 @@ compute_local_properties (sbitmap *transp, sbitmap *comp, 
sbitmap *antloc,
        }
      }
  }
+
+/* A hardreg set is not transparent in a block if there are any uses of that
+   hardreg.  This filters the results of compute_local_properties, after the
+   result of that function has been used to define the kills bitmap.
+
+   TRANSP is the destination sbitmap to be updated.
+
+   TABLE controls which hash table to look at.  */
Sorry, that comment doesn't make much sense to me. A use doesn't traditionally impact transparency.


  
  /* Hash table support.  */
@@ -739,6 +794,8 @@ struct reg_avail_info
  };
static struct reg_avail_info *reg_avail_info;
+static basic_block hardreg_last_bb;
+static int hardreg_first_use;
  static basic_block current_bb;
/* See whether X, the source of a set, is something we want to consider for
@@ -747,6 +804,9 @@ static basic_block current_bb;
  static bool
  want_to_gcse_p (rtx x, machine_mode mode, HOST_WIDE_INT *max_distance_ptr)
  {
+  if (doing_hardreg_pre_p)
+    return true;
This seems overly aggressive, though perhaps it's not so bad in practice.





@@ -1286,12 +1348,33 @@ hash_scan_set (rtx set, rtx_insn *insn, struct 
gcse_hash_table_d *table)
             able to handle code motion of insns with multiple sets.  */
          bool antic_p = (oprs_anticipatable_p (src, insn)
                          && !multiple_sets (insn));
+         if (doing_hardreg_pre_p)
+           {
+             /* An hardreg assignment is anticipatable only if the hardreg is
+                neither set nor used prior to this assignment.  */
+             auto info = reg_avail_info[current_hardreg_regno];
+             if ((info.last_bb == current_bb
+                  && info.first_set < DF_INSN_LUID (insn))
+                 || (hardreg_last_bb == current_bb
+                     && hardreg_first_use <= DF_INSN_LUID (insn)))
+               antic_p = false;
+           }
+
          /* An expression is not available if its operands are
             subsequently modified, including this insn.  It's also not
             available if this is a branch, because we can't insert
             a set after the branch.  */
          bool avail_p = (oprs_available_p (src, insn)
                          && ! JUMP_P (insn));
+         if (doing_hardreg_pre_p)
+           {
+             /* An hardreg assignment is only available if the hardreg is
+                not set later in the BB.  Uses of the hardreg are allowed. */
+             auto info = reg_avail_info[current_hardreg_regno];
+             if (info.last_bb == current_bb
+                 && info.last_set > DF_INSN_LUID (insn))
+               antic_p = false;
Did you mean to set antic_p here, or should it have been avail_p?



@@ -1537,6 +1623,18 @@ compute_hash_table_work (struct gcse_hash_table_d *table)
              EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
                record_last_reg_set_info (insn, regno);
+ if (doing_hardreg_pre_p)
+               {
+                 /* This is covered by the above clobbers, but let's
+                    conservatively make this work as well for hardregs that
+                    are call-used but not call-clobbered.  */
+                 record_last_reg_set_info (insn, current_hardreg_regno);
+
+                 /* Mark this block as containing a call-clobber.  */
+                 bitmap_set_bit (blocks_with_calls,
+                                 BLOCK_FOR_INSN (insn)->index);
+       
I don't think this is wrong, but isn't it redundant? ie, shouldn't we have populated that bitmap elsewhere?




@@ -2159,13 +2335,24 @@ pre_edge_insert (struct edge_list *edge_list, struct 
gcse_expr **index_map)
/* We can't insert anything on an abnormal and
                           critical edge, so we insert the insn at the end of
-                          the previous block. There are several alternatives
+                          the previous block.  There are several alternatives
                           detailed in Morgans book P277 (sec 10.5) for
                           handling this situation.  This one is easiest for
-                          now.  */
+                          now.
+ For hardreg PRE, this would add an unwanted clobber
+                          of the hardreg, so we instead insert in the
+                          successor block, which may be partially redundant
+                          but is at least correct.  */
But if it's abnormal critical, then doesn't this result in clobbers on paths where we didn't have them before? Is that actually safe?

That's what I've got after a first pass over the bits.

Jeff

Reply via email to