On Sun, Dec 29, 2024 at 10:54:03AM -0700, Jeff Law wrote:
> 
> 
> On 12/5/24 8:45 AM, Andrew Carlotti wrote:
> 
> > > So at a 30k foot level, one thing to be very leery of is extending the
> > > lifetime of any hard register.  It's probably not a big deal on aarch, but
> > > it can cause all kinds of headaches on other targets.
> > > 
> > > Essentially you probably need to avoid PRE on a hard register that's in a
> > > likely spilled class.
> > 
> > This is not intended to be used for ordinary registers, so that shouldn't 
> > be a
> > concern.  The use case is essentially as a form of mode-switching, where the
> > active mode is specified by a register that can take arbitrary values at
> > runtime.
> I don't recall the details of the patch, but essentially you're going to
> need some kind of way to prune the set of hard registers subject to this
> optimization -- which would need to include filtering out any hard register
> that's in a likely spilled class.

By "in a likely splilled class", do you mean a register where register
allocation might end up splitting up the live range and spilling/reloading the
value?  Or something else?

I think my pass handles this by only operating on a list of hardregs specified
by a new target hook.  It would be up to the target hooks to ensure that the
optimisation isn't applied to inappropriate register classes.

> > 
> > In this new hardreg PRE pass we want to move the actual assignment to the
> > hardreg, so we have to additionally check whether moving the assignment 
> > earlier
> > (and not just moving the compuation earlier) would conflict with existing 
> > uses
> > of the hardreg.  The transparency (and antic) bitmaps are used to indicate 
> > this
> > property in LCM; perhaps the name is less accurate when applied to hardreg 
> > PRE.
> Understood.  I think this is primarily a naming problem.
> 
> > > > @@ -747,6 +804,9 @@ static basic_block current_bb;
> > > >    static bool
> > > >    want_to_gcse_p (rtx x, machine_mode mode, HOST_WIDE_INT 
> > > > *max_distance_ptr)
> > > >    {
> > > > +  if (doing_hardreg_pre_p)
> > > > +    return true;
> > > This seems overly aggressive, though perhaps it's not so bad in practice.
> > 
> > This should be fine - want_to_gcse_p is aiming to filter out cases where the
> > lifetime of the newly-created pseduoregs would increase register to an 
> > extent
> > that the resulting spills would outweigh the benefits of the code motion.
> > 
> > For hardreg PRE we don't create new pseduoregs, but instead use specific
> > hardregs that aren't used by RA, and we check for conflicting uses in
> > determining the extent of the code motion.  So the checks in want_to_gcse_p 
> > are
> > irrelevant here.
> want_to_gcse_p was originally meant to filter out things we didn't know how
> to handle or which were seen as not profitable to gcse, ever. Register
> lifetimes didn't come into play until much later.

Right - I realised there were still some cases that should be filtered out for
hardreg PRE, so I changed this for the v2 patch.
(https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671801.html)
 
> > 
> > > > @@ -1537,6 +1623,18 @@ compute_hash_table_work (struct 
> > > > gcse_hash_table_d *table)
> > > >               EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, 
> > > > regno, hrsi)
> > > >                 record_last_reg_set_info (insn, regno);
> > > > +             if (doing_hardreg_pre_p)
> > > > +               {
> > > > +                 /* This is covered by the above clobbers, but let's
> > > > +                    conservatively make this work as well for hardregs 
> > > > that
> > > > +                    are call-used but not call-clobbered.  */
> > > > +                 record_last_reg_set_info (insn, 
> > > > current_hardreg_regno);
> > > > +
> > > > +                 /* Mark this block as containing a call-clobber.  */
> > > > +                 bitmap_set_bit (blocks_with_calls,
> > > > +                                 BLOCK_FOR_INSN (insn)->index);
> > > > +       
> > > I don't think this is wrong, but isn't it redundant?  ie, shouldn't we 
> > > have
> > > populated that bitmap elsewhere?
> > 
> > The existing code only uses blocks_with_calls for load motion, to indicate 
> > when
> > memory might be clobbered by a call.  The existing computation of 
> > blocks_with_calls is in
> > record_last_mem_set_info_common, but I bypass this with an added early exit
> > condition in record_last_mem_set_info (in the same way that I bypass all the
> > other code specific to load motion).
> If I remember correctly, it's also used for loads outside the load/store
> motion variant of PRE.  ie, PRE of loads (without trying to do any motion of
> stores).

I've dropped the reuse of blocks_with_calls from v2 of the patch anyway,
because I realised that DF analysis can already detect call-clobbered
registers.

> > 
> > In the hardreg PRE pass we don't need to check for clobbered memory, but we 
> > do
> > need to check whether the hardreg might be clobbered by a call.  It seemed
> > sensible to reuse the existing suitably named bitmap to store this 
> > information,
> > but because I bypassed the existing computation, I needed to add the
> > computation back in elsewhere.
> ACK.  Note this all plays into the need to walk into the FUSAGE notes as
> well, which I think this patch failed to do.

Does the use of DF analysis cover this, or are there additional checks still 
required?

> Jeff

Reply via email to