On Mon, Dec 02, 2024 at 08:59:20AM -0700, Jeff Law wrote:
> 
> 
> On 10/31/24 12:29 PM, Andrew Carlotti wrote:
> > This pass is used to optimise assignments to the FPMR register in
> > aarch64.  I chose to implement this as a middle-end pass because it
> > mostly reuses the existing RTL PRE code within gcse.cc.
> > 
> > Compared to RTL PRE, the key difference in this new pass is that we
> > insert new writes directly to the destination hardreg, instead of
> > writing to a new pseudo-register and copying the result later.  This
> > requires changes to the analysis portion of the pass, because sets
> > cannot be moved before existing instructions that set, use or clobber
> > the hardreg, and the value becomes unavailable after any uses of
> > clobbers of the hardreg.
> > 
> > This patch would currently break any debug instructions that use the
> > value of fpmr in a region of code where that value is changed by this
> > pass.  I haven't worked out the best way to fix this, but I suspect the
> > issue is uncommon and tricky enough that it would be best to just drop
> > those debug instructions.
> > 
> > I've bootstrapped and regression tested this on aarch64, and it should be 
> > NFC
> > on other targets.  Aside from this, my testing so far has involved hacking 
> > in a
> > single FP8 intrinsic and testing various parameters and control flow
> > structures, and checking both the codegen and the LCM bitmaps.  I intend to
> > write better and more comprehensive tests once there are some real intrinsic
> > implementations available to use.
> > 
> > 
> > Is this approach good?  Apart from fixing the debug instructions and
> > adding tests, is there anything else I need to change?
> > 
> > 
> > gcc/ChangeLog:
> > 
> >     * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
> >     * gcse.cc (doing_hardreg_pre_p): New global variable.
> >     (current_hardreg_regno): Ditto.
> >     (compute_local_properties): Unset transp for hardreg clobbers.
> >     (prune_hardreg_uses): New.
> >     (want_to_gcse_p): Always return true for hardreg PRE.
> >     (hash_scan_set): Add checks for hardreg uses/clobbers.
> >     (oprs_unchanged_p): Disable load motion for hardreg PRE pass.
> >     (record_last_mem_set_info): Ditto.
> >     (compute_hash_table_work): Record hardreg uses.
> >     (prune_expressions): Mark hardreg sets as call-clobbered.
> >     (compute_pre_data): Add call to prune_hardreg_uses.
> >     (pre_expr_reaches_here_p_work): Add comment.
> >     (insert_insn_start_basic_block): New functions.
> >     (pre_edge_insert): Don't add hardreg sets to predecessor block.
> >     (pre_delete): Use hardreg for the reaching reg.
> >     (pre_gcse): Don't insert copies for hardreg PRE.
> >     (one_pre_gcse_pass): Disable load motion for hardreg PRE pass.
> >     (execute_hardreg_pre): New.
> >     (class pass_hardreg_pre): New.
> >     (pass_hardreg_pre::gate): New.
> >     (make_pass_hardreg_pre): New.
> >     * passes.def (pass_hardreg_pre): New pass.
> >     * tree-pass.h (make_pass_hardreg_pre): New.
> So at a 30k foot level, one thing to be very leery of is extending the
> lifetime of any hard register.  It's probably not a big deal on aarch, but
> it can cause all kinds of headaches on other targets.
> 
> Essentially you probably need to avoid PRE on a hard register that's in a
> likely spilled class.

This is not intended to be used for ordinary registers, so that shouldn't be a
concern.  The use case is essentially as a form of mode-switching, where the
active mode is specified by a register that can take arbitrary values at
runtime.
 
> > 
> > 
> 
> > diff --git a/gcc/gcse.cc b/gcc/gcse.cc
> > index 
> > 31b92f30fa1ba6c519429d4b7bc55547b2d71c01..ce4ebe420c02d78fcde3144eed595e22212aaa0b
> >  100644
> > --- a/gcc/gcse.cc
> > +++ b/gcc/gcse.cc
> 
> > @@ -693,10 +698,29 @@ compute_local_properties (sbitmap *transp, sbitmap 
> > *comp, sbitmap *antloc,
> >          We start by assuming all are transparent [none are killed], and
> >          then reset the bits for those that are.  */
> >       if (transp)
> > -       compute_transp (expr->expr, indx, transp,
> > -                       blocks_with_calls,
> > -                       modify_mem_list_set,
> > -                       canon_modify_mem_list);
> > +       {
> > +         compute_transp (expr->expr, indx, transp,
> > +                         blocks_with_calls,
> > +                         modify_mem_list_set,
> > +                         canon_modify_mem_list);
> > +
> > +         if (doing_hardreg_pre_p)
> > +           {
> > +             /* We also need to check whether the destination hardreg is
> > +                set or call-clobbered in each BB.  We'll check for hardreg
> > +                uses later.  */
> > +             df_ref def;
> > +             for (def = DF_REG_DEF_CHAIN (current_hardreg_regno);
> > +                  def;
> > +                  def = DF_REF_NEXT_REG (def))
> > +               bitmap_clear_bit (transp[DF_REF_BB (def)->index], indx);
> > +
> > +             bitmap_iterator bi;
> > +             unsigned bb_index;
> > +             EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi)
> > +               bitmap_clear_bit (transp[bb_index], indx);
> > +           }
> > +       }
> It's been a long time since I looked at the code, but is there code already
> in the pass to walk down the FUSAGE notes attached to calls? You'll
> definitely need that since it can have uses/clobbers of hard regs that are
> potentially outside the set normally clobbered by calls.
> 
> 
> 
> 
> >       /* The occurrences recorded in antic_occr are exactly those that
> >          we want to set to nonzero in ANTLOC.  */
> > @@ -728,6 +752,37 @@ compute_local_properties (sbitmap *transp, sbitmap 
> > *comp, sbitmap *antloc,
> >     }
> >       }
> >   }
> > +
> > +/* A hardreg set is not transparent in a block if there are any uses of 
> > that
> > +   hardreg.  This filters the results of compute_local_properties, after 
> > the
> > +   result of that function has been used to define the kills bitmap.
> > +
> > +   TRANSP is the destination sbitmap to be updated.
> > +
> > +   TABLE controls which hash table to look at.  */
> Sorry, that comment doesn't make much sense to me.  A use doesn't
> traditionally impact transparency.

This is not quite the traditional GCSE scenario.

In traditional GCSE/LCM we're just moving an abstract computation (and
temporarily storing the result in a newly-created pseudoreg so that it can't
interfere with existing uses of the destination registers).

In this new hardreg PRE pass we want to move the actual assignment to the
hardreg, so we have to additionally check whether moving the assignment earlier
(and not just moving the compuation earlier) would conflict with existing uses
of the hardreg.  The transparency (and antic) bitmaps are used to indicate this
property in LCM; perhaps the name is less accurate when applied to hardreg PRE.

> >   
> >   /* Hash table support.  */
> > @@ -739,6 +794,8 @@ struct reg_avail_info
> >   };
> >   static struct reg_avail_info *reg_avail_info;
> > +static basic_block hardreg_last_bb;
> > +static int hardreg_first_use;
> >   static basic_block current_bb;
> >   /* See whether X, the source of a set, is something we want to consider 
> > for
> > @@ -747,6 +804,9 @@ static basic_block current_bb;
> >   static bool
> >   want_to_gcse_p (rtx x, machine_mode mode, HOST_WIDE_INT *max_distance_ptr)
> >   {
> > +  if (doing_hardreg_pre_p)
> > +    return true;
> This seems overly aggressive, though perhaps it's not so bad in practice.

This should be fine - want_to_gcse_p is aiming to filter out cases where the
lifetime of the newly-created pseduoregs would increase register to an extent
that the resulting spills would outweigh the benefits of the code motion.

For hardreg PRE we don't create new pseduoregs, but instead use specific
hardregs that aren't used by RA, and we check for conflicting uses in
determining the extent of the code motion.  So the checks in want_to_gcse_p are
irrelevant here.
 
> > @@ -1286,12 +1348,33 @@ hash_scan_set (rtx set, rtx_insn *insn, struct 
> > gcse_hash_table_d *table)
> >          able to handle code motion of insns with multiple sets.  */
> >       bool antic_p = (oprs_anticipatable_p (src, insn)
> >                       && !multiple_sets (insn));
> > +     if (doing_hardreg_pre_p)
> > +       {
> > +         /* An hardreg assignment is anticipatable only if the hardreg is
> > +            neither set nor used prior to this assignment.  */
> > +         auto info = reg_avail_info[current_hardreg_regno];
> > +         if ((info.last_bb == current_bb
> > +              && info.first_set < DF_INSN_LUID (insn))
> > +             || (hardreg_last_bb == current_bb
> > +                 && hardreg_first_use <= DF_INSN_LUID (insn)))
> > +           antic_p = false;
> > +       }
> > +
> >       /* An expression is not available if its operands are
> >          subsequently modified, including this insn.  It's also not
> >          available if this is a branch, because we can't insert
> >          a set after the branch.  */
> >       bool avail_p = (oprs_available_p (src, insn)
> >                       && ! JUMP_P (insn));
> > +     if (doing_hardreg_pre_p)
> > +       {
> > +         /* An hardreg assignment is only available if the hardreg is
> > +            not set later in the BB.  Uses of the hardreg are allowed. */
> > +         auto info = reg_avail_info[current_hardreg_regno];
> > +         if (info.last_bb == current_bb
> > +             && info.last_set > DF_INSN_LUID (insn))
> > +           antic_p = false;
> Did you mean to set antic_p here, or should it have been avail_p?

Oops - that definitely should have been avail_p.  This will be fixed in the
next version I send.

> > @@ -1537,6 +1623,18 @@ compute_hash_table_work (struct gcse_hash_table_d 
> > *table)
> >           EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
> >             record_last_reg_set_info (insn, regno);
> > +         if (doing_hardreg_pre_p)
> > +           {
> > +             /* This is covered by the above clobbers, but let's
> > +                conservatively make this work as well for hardregs that
> > +                are call-used but not call-clobbered.  */
> > +             record_last_reg_set_info (insn, current_hardreg_regno);
> > +
> > +             /* Mark this block as containing a call-clobber.  */
> > +             bitmap_set_bit (blocks_with_calls,
> > +                             BLOCK_FOR_INSN (insn)->index);
> > +   
> I don't think this is wrong, but isn't it redundant?  ie, shouldn't we have
> populated that bitmap elsewhere?

The existing code only uses blocks_with_calls for load motion, to indicate when
memory might be clobbered by a call.  The existing computation of 
blocks_with_calls is in
record_last_mem_set_info_common, but I bypass this with an added early exit
condition in record_last_mem_set_info (in the same way that I bypass all the
other code specific to load motion).

In the hardreg PRE pass we don't need to check for clobbered memory, but we do
need to check whether the hardreg might be clobbered by a call.  It seemed
sensible to reuse the existing suitably named bitmap to store this information,
but because I bypassed the existing computation, I needed to add the
computation back in elsewhere.

> > @@ -2159,13 +2335,24 @@ pre_edge_insert (struct edge_list *edge_list, 
> > struct gcse_expr **index_map)
> >                     /* We can't insert anything on an abnormal and
> >                        critical edge, so we insert the insn at the end of
> > -                      the previous block. There are several alternatives
> > +                      the previous block.  There are several alternatives
> >                        detailed in Morgans book P277 (sec 10.5) for
> >                        handling this situation.  This one is easiest for
> > -                      now.  */
> > +                      now.
> > +                      For hardreg PRE, this would add an unwanted clobber
> > +                      of the hardreg, so we instead insert in the
> > +                      successor block, which may be partially redundant
> > +                      but is at least correct.  */
> But if it's abnormal critical, then doesn't this result in clobbers on paths
> where we didn't have them before?  Is that actually safe?

This should be entirely safe - the other paths will merely have a redundant set
with the same value that the hardreg already contains.

These paths will either have an earlier assignment on a different edge (that
will also have been marked for insertion by LCM), or will have the result of
the computation already available in the hardreg from an assignment that
existed before the pass.

> That's what I've got after a first pass over the bits.
> 
> Jeff
> 

Reply via email to