On Wed, Nov 13, 2024 at 07:03:44PM +0000, Richard Sandiford wrote:
> Andrew Carlotti <andrew.carlo...@arm.com> writes:
> > On Tue, Nov 12, 2024 at 10:42:50PM +0000, Richard Sandiford wrote:
> >> Sorry for the slow review.  I think Jeff's much better placed to comment
> >> on this than I am, but here's a stab.  Mostly it looks really good to me
> >> FWIW.
> >> 
> >> Andrew Carlotti <andrew.carlo...@arm.com> writes:
> >> > This pass is used to optimise assignments to the FPMR register in
> >> > aarch64.  I chose to implement this as a middle-end pass because it
> >> > mostly reuses the existing RTL PRE code within gcse.cc.
> >> >
> >> > Compared to RTL PRE, the key difference in this new pass is that we
> >> > insert new writes directly to the destination hardreg, instead of
> >> > writing to a new pseudo-register and copying the result later.  This
> >> > requires changes to the analysis portion of the pass, because sets
> >> > cannot be moved before existing instructions that set, use or clobber
> >> > the hardreg, and the value becomes unavailable after any uses of
> >> > clobbers of the hardreg.
> >> >
> >> > This patch would currently break any debug instructions that use the
> >> > value of fpmr in a region of code where that value is changed by this
> >> > pass.  I haven't worked out the best way to fix this, but I suspect the
> >> > issue is uncommon and tricky enough that it would be best to just drop
> >> > those debug instructions.
> >> 
> >> Yeah, good question, and pass on that :)  Will need to think more about it.
> >> 
> >> > I've bootstrapped and regression tested this on aarch64, and it should 
> >> > be NFC
> >> > on other targets.  Aside from this, my testing so far has involved 
> >> > hacking in a
> >> > single FP8 intrinsic and testing various parameters and control flow
> >> > structures, and checking both the codegen and the LCM bitmaps.  I intend 
> >> > to
> >> > write better and more comprehensive tests once there are some real 
> >> > intrinsic
> >> > implementations available to use.
> >> >
> >> >
> >> > Is this approach good?  Apart from fixing the debug instructions and
> >> > adding tests, is there anything else I need to change?
> >> >
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
> >> >  * gcse.cc (doing_hardreg_pre_p): New global variable.
> >> >  (current_hardreg_regno): Ditto.
> >> >  (compute_local_properties): Unset transp for hardreg clobbers.
> >> >  (prune_hardreg_uses): New.
> >> >  (want_to_gcse_p): Always return true for hardreg PRE.
> >> >  (hash_scan_set): Add checks for hardreg uses/clobbers.
> >> >  (oprs_unchanged_p): Disable load motion for hardreg PRE pass.
> >> >  (record_last_mem_set_info): Ditto.
> >> >  (compute_hash_table_work): Record hardreg uses.
> >> >  (prune_expressions): Mark hardreg sets as call-clobbered.
> >> >  (compute_pre_data): Add call to prune_hardreg_uses.
> >> >  (pre_expr_reaches_here_p_work): Add comment.
> >> >  (insert_insn_start_basic_block): New functions.
> >> >  (pre_edge_insert): Don't add hardreg sets to predecessor block.
> >> >  (pre_delete): Use hardreg for the reaching reg.
> >> >  (pre_gcse): Don't insert copies for hardreg PRE.
> >> >  (one_pre_gcse_pass): Disable load motion for hardreg PRE pass.
> >> >  (execute_hardreg_pre): New.
> >> >  (class pass_hardreg_pre): New.
> >> >  (pass_hardreg_pre::gate): New.
> >> >  (make_pass_hardreg_pre): New.
> >> >  * passes.def (pass_hardreg_pre): New pass.
> >> >  * tree-pass.h (make_pass_hardreg_pre): New.
> >> >
> >> > [...]
> >> > @@ -728,6 +752,37 @@ compute_local_properties (sbitmap *transp, sbitmap 
> >> > *comp, sbitmap *antloc,
> >> >          }
> >> >      }
> >> >  }
> >> > +
> >> > +/* A hardreg set is not transparent in a block if there are any uses of 
> >> > that
> >> > +   hardreg.  This filters the results of compute_local_properties, 
> >> > after the
> >> > +   result of that function has been used to define the kills bitmap.
> >> 
> >> I think this is mostly my ignorance of the code, and would be obvious
> >> if I tried it out locally, but: why do we need to do this after
> >> computing the kills bitmap?  For mode-switching, the kills bitmap
> >> is the inverse of the transparency bitmap, but it sounds like here
> >> you want the kills bitmap to be more selective.
> >
> > I had to work through the entire LCM algorithm before I understood how these
> > bitmaps were being used (and I intend to update the documentation to make 
> > this
> > more obvious).  In summary, the kills and avail bitmaps indicate whether the
> > result of an earlier expression is still available and up-to-date, whereas 
> > the
> > transparent and anticipatable bitmaps indicate whether a later assignment 
> > can
> > be moved earlier.
> 
> Right.  That part is pretty standard.
> 
> > For the existing hoist/PRE passes these are the same - this is because new
> > pseduoregs are used to hold the result of relocated computations, so the 
> > only
> > obstruction is if the values of the inputs to the expression are changed.
> >
> > For the new hardreg PRE pass the bitmaps are different in one case - if the
> > content of the hardreg is used, then the result of the expression remains
> > available after the use, but it isn't possible to anticipate a future
> > assignment by moving that assignment before the earlier use.
> 
> But what I meant was: doesn't an assignment to the hard register block
> movement/reuse in both directions?  We can't move R:=X up through a block B
> that requires R==Y (so X is not transparent in B).  We also can't
> reuse R:=X after a block that requires R==Y (because B kills X).
> 
> That's why I was expecting the kill set to be updated too, not just the
> transparency set.

An assignment to the hardreg does indeed block movement/reuse in both
directions, but this case is handled elsewhere.  The code here is specifically
to handle instructions that use the hardreg but do not modify it.

> Thanks,
> Richard

Reply via email to