On Wed, Nov 13, 2024 at 07:03:44PM +0000, Richard Sandiford wrote: > Andrew Carlotti <andrew.carlo...@arm.com> writes: > > On Tue, Nov 12, 2024 at 10:42:50PM +0000, Richard Sandiford wrote: > >> Sorry for the slow review. I think Jeff's much better placed to comment > >> on this than I am, but here's a stab. Mostly it looks really good to me > >> FWIW. > >> > >> Andrew Carlotti <andrew.carlo...@arm.com> writes: > >> > This pass is used to optimise assignments to the FPMR register in > >> > aarch64. I chose to implement this as a middle-end pass because it > >> > mostly reuses the existing RTL PRE code within gcse.cc. > >> > > >> > Compared to RTL PRE, the key difference in this new pass is that we > >> > insert new writes directly to the destination hardreg, instead of > >> > writing to a new pseudo-register and copying the result later. This > >> > requires changes to the analysis portion of the pass, because sets > >> > cannot be moved before existing instructions that set, use or clobber > >> > the hardreg, and the value becomes unavailable after any uses of > >> > clobbers of the hardreg. > >> > > >> > This patch would currently break any debug instructions that use the > >> > value of fpmr in a region of code where that value is changed by this > >> > pass. I haven't worked out the best way to fix this, but I suspect the > >> > issue is uncommon and tricky enough that it would be best to just drop > >> > those debug instructions. > >> > >> Yeah, good question, and pass on that :) Will need to think more about it. > >> > >> > I've bootstrapped and regression tested this on aarch64, and it should > >> > be NFC > >> > on other targets. Aside from this, my testing so far has involved > >> > hacking in a > >> > single FP8 intrinsic and testing various parameters and control flow > >> > structures, and checking both the codegen and the LCM bitmaps. I intend > >> > to > >> > write better and more comprehensive tests once there are some real > >> > intrinsic > >> > implementations available to use. > >> > > >> > > >> > Is this approach good? Apart from fixing the debug instructions and > >> > adding tests, is there anything else I need to change? > >> > > >> > > >> > gcc/ChangeLog: > >> > > >> > * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro. > >> > * gcse.cc (doing_hardreg_pre_p): New global variable. > >> > (current_hardreg_regno): Ditto. > >> > (compute_local_properties): Unset transp for hardreg clobbers. > >> > (prune_hardreg_uses): New. > >> > (want_to_gcse_p): Always return true for hardreg PRE. > >> > (hash_scan_set): Add checks for hardreg uses/clobbers. > >> > (oprs_unchanged_p): Disable load motion for hardreg PRE pass. > >> > (record_last_mem_set_info): Ditto. > >> > (compute_hash_table_work): Record hardreg uses. > >> > (prune_expressions): Mark hardreg sets as call-clobbered. > >> > (compute_pre_data): Add call to prune_hardreg_uses. > >> > (pre_expr_reaches_here_p_work): Add comment. > >> > (insert_insn_start_basic_block): New functions. > >> > (pre_edge_insert): Don't add hardreg sets to predecessor block. > >> > (pre_delete): Use hardreg for the reaching reg. > >> > (pre_gcse): Don't insert copies for hardreg PRE. > >> > (one_pre_gcse_pass): Disable load motion for hardreg PRE pass. > >> > (execute_hardreg_pre): New. > >> > (class pass_hardreg_pre): New. > >> > (pass_hardreg_pre::gate): New. > >> > (make_pass_hardreg_pre): New. > >> > * passes.def (pass_hardreg_pre): New pass. > >> > * tree-pass.h (make_pass_hardreg_pre): New. > >> > > >> > [...] > >> > @@ -728,6 +752,37 @@ compute_local_properties (sbitmap *transp, sbitmap > >> > *comp, sbitmap *antloc, > >> > } > >> > } > >> > } > >> > + > >> > +/* A hardreg set is not transparent in a block if there are any uses of > >> > that > >> > + hardreg. This filters the results of compute_local_properties, > >> > after the > >> > + result of that function has been used to define the kills bitmap. > >> > >> I think this is mostly my ignorance of the code, and would be obvious > >> if I tried it out locally, but: why do we need to do this after > >> computing the kills bitmap? For mode-switching, the kills bitmap > >> is the inverse of the transparency bitmap, but it sounds like here > >> you want the kills bitmap to be more selective. > > > > I had to work through the entire LCM algorithm before I understood how these > > bitmaps were being used (and I intend to update the documentation to make > > this > > more obvious). In summary, the kills and avail bitmaps indicate whether the > > result of an earlier expression is still available and up-to-date, whereas > > the > > transparent and anticipatable bitmaps indicate whether a later assignment > > can > > be moved earlier. > > Right. That part is pretty standard. > > > For the existing hoist/PRE passes these are the same - this is because new > > pseduoregs are used to hold the result of relocated computations, so the > > only > > obstruction is if the values of the inputs to the expression are changed. > > > > For the new hardreg PRE pass the bitmaps are different in one case - if the > > content of the hardreg is used, then the result of the expression remains > > available after the use, but it isn't possible to anticipate a future > > assignment by moving that assignment before the earlier use. > > But what I meant was: doesn't an assignment to the hard register block > movement/reuse in both directions? We can't move R:=X up through a block B > that requires R==Y (so X is not transparent in B). We also can't > reuse R:=X after a block that requires R==Y (because B kills X). > > That's why I was expecting the kill set to be updated too, not just the > transparency set.
An assignment to the hardreg does indeed block movement/reuse in both directions, but this case is handled elsewhere. The code here is specifically to handle instructions that use the hardreg but do not modify it. > Thanks, > Richard