Andrew Carlotti <andrew.carlo...@arm.com> writes:
> On Tue, Nov 12, 2024 at 10:42:50PM +0000, Richard Sandiford wrote:
>> Sorry for the slow review.  I think Jeff's much better placed to comment
>> on this than I am, but here's a stab.  Mostly it looks really good to me
>> FWIW.
>> 
>> Andrew Carlotti <andrew.carlo...@arm.com> writes:
>> > This pass is used to optimise assignments to the FPMR register in
>> > aarch64.  I chose to implement this as a middle-end pass because it
>> > mostly reuses the existing RTL PRE code within gcse.cc.
>> >
>> > Compared to RTL PRE, the key difference in this new pass is that we
>> > insert new writes directly to the destination hardreg, instead of
>> > writing to a new pseudo-register and copying the result later.  This
>> > requires changes to the analysis portion of the pass, because sets
>> > cannot be moved before existing instructions that set, use or clobber
>> > the hardreg, and the value becomes unavailable after any uses of
>> > clobbers of the hardreg.
>> >
>> > This patch would currently break any debug instructions that use the
>> > value of fpmr in a region of code where that value is changed by this
>> > pass.  I haven't worked out the best way to fix this, but I suspect the
>> > issue is uncommon and tricky enough that it would be best to just drop
>> > those debug instructions.
>> 
>> Yeah, good question, and pass on that :)  Will need to think more about it.
>> 
>> > I've bootstrapped and regression tested this on aarch64, and it should be 
>> > NFC
>> > on other targets.  Aside from this, my testing so far has involved hacking 
>> > in a
>> > single FP8 intrinsic and testing various parameters and control flow
>> > structures, and checking both the codegen and the LCM bitmaps.  I intend to
>> > write better and more comprehensive tests once there are some real 
>> > intrinsic
>> > implementations available to use.
>> >
>> >
>> > Is this approach good?  Apart from fixing the debug instructions and
>> > adding tests, is there anything else I need to change?
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> >    * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
>> >    * gcse.cc (doing_hardreg_pre_p): New global variable.
>> >    (current_hardreg_regno): Ditto.
>> >    (compute_local_properties): Unset transp for hardreg clobbers.
>> >    (prune_hardreg_uses): New.
>> >    (want_to_gcse_p): Always return true for hardreg PRE.
>> >    (hash_scan_set): Add checks for hardreg uses/clobbers.
>> >    (oprs_unchanged_p): Disable load motion for hardreg PRE pass.
>> >    (record_last_mem_set_info): Ditto.
>> >    (compute_hash_table_work): Record hardreg uses.
>> >    (prune_expressions): Mark hardreg sets as call-clobbered.
>> >    (compute_pre_data): Add call to prune_hardreg_uses.
>> >    (pre_expr_reaches_here_p_work): Add comment.
>> >    (insert_insn_start_basic_block): New functions.
>> >    (pre_edge_insert): Don't add hardreg sets to predecessor block.
>> >    (pre_delete): Use hardreg for the reaching reg.
>> >    (pre_gcse): Don't insert copies for hardreg PRE.
>> >    (one_pre_gcse_pass): Disable load motion for hardreg PRE pass.
>> >    (execute_hardreg_pre): New.
>> >    (class pass_hardreg_pre): New.
>> >    (pass_hardreg_pre::gate): New.
>> >    (make_pass_hardreg_pre): New.
>> >    * passes.def (pass_hardreg_pre): New pass.
>> >    * tree-pass.h (make_pass_hardreg_pre): New.
>> >
>> > [...]
>> > @@ -728,6 +752,37 @@ compute_local_properties (sbitmap *transp, sbitmap 
>> > *comp, sbitmap *antloc,
>> >    }
>> >      }
>> >  }
>> > +
>> > +/* A hardreg set is not transparent in a block if there are any uses of 
>> > that
>> > +   hardreg.  This filters the results of compute_local_properties, after 
>> > the
>> > +   result of that function has been used to define the kills bitmap.
>> 
>> I think this is mostly my ignorance of the code, and would be obvious
>> if I tried it out locally, but: why do we need to do this after
>> computing the kills bitmap?  For mode-switching, the kills bitmap
>> is the inverse of the transparency bitmap, but it sounds like here
>> you want the kills bitmap to be more selective.
>
> I had to work through the entire LCM algorithm before I understood how these
> bitmaps were being used (and I intend to update the documentation to make this
> more obvious).  In summary, the kills and avail bitmaps indicate whether the
> result of an earlier expression is still available and up-to-date, whereas the
> transparent and anticipatable bitmaps indicate whether a later assignment can
> be moved earlier.

Right.  That part is pretty standard.

> For the existing hoist/PRE passes these are the same - this is because new
> pseduoregs are used to hold the result of relocated computations, so the only
> obstruction is if the values of the inputs to the expression are changed.
>
> For the new hardreg PRE pass the bitmaps are different in one case - if the
> content of the hardreg is used, then the result of the expression remains
> available after the use, but it isn't possible to anticipate a future
> assignment by moving that assignment before the earlier use.

But what I meant was: doesn't an assignment to the hard register block
movement/reuse in both directions?  We can't move R:=X up through a block B
that requires R==Y (so X is not transparent in B).  We also can't
reuse R:=X after a block that requires R==Y (because B kills X).

That's why I was expecting the kill set to be updated too, not just the
transparency set.

Thanks,
Richard

Reply via email to