Andrew Carlotti <andrew.carlo...@arm.com> writes: > On Tue, Nov 12, 2024 at 10:42:50PM +0000, Richard Sandiford wrote: >> Sorry for the slow review. I think Jeff's much better placed to comment >> on this than I am, but here's a stab. Mostly it looks really good to me >> FWIW. >> >> Andrew Carlotti <andrew.carlo...@arm.com> writes: >> > This pass is used to optimise assignments to the FPMR register in >> > aarch64. I chose to implement this as a middle-end pass because it >> > mostly reuses the existing RTL PRE code within gcse.cc. >> > >> > Compared to RTL PRE, the key difference in this new pass is that we >> > insert new writes directly to the destination hardreg, instead of >> > writing to a new pseudo-register and copying the result later. This >> > requires changes to the analysis portion of the pass, because sets >> > cannot be moved before existing instructions that set, use or clobber >> > the hardreg, and the value becomes unavailable after any uses of >> > clobbers of the hardreg. >> > >> > This patch would currently break any debug instructions that use the >> > value of fpmr in a region of code where that value is changed by this >> > pass. I haven't worked out the best way to fix this, but I suspect the >> > issue is uncommon and tricky enough that it would be best to just drop >> > those debug instructions. >> >> Yeah, good question, and pass on that :) Will need to think more about it. >> >> > I've bootstrapped and regression tested this on aarch64, and it should be >> > NFC >> > on other targets. Aside from this, my testing so far has involved hacking >> > in a >> > single FP8 intrinsic and testing various parameters and control flow >> > structures, and checking both the codegen and the LCM bitmaps. I intend to >> > write better and more comprehensive tests once there are some real >> > intrinsic >> > implementations available to use. >> > >> > >> > Is this approach good? Apart from fixing the debug instructions and >> > adding tests, is there anything else I need to change? >> > >> > >> > gcc/ChangeLog: >> > >> > * config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro. >> > * gcse.cc (doing_hardreg_pre_p): New global variable. >> > (current_hardreg_regno): Ditto. >> > (compute_local_properties): Unset transp for hardreg clobbers. >> > (prune_hardreg_uses): New. >> > (want_to_gcse_p): Always return true for hardreg PRE. >> > (hash_scan_set): Add checks for hardreg uses/clobbers. >> > (oprs_unchanged_p): Disable load motion for hardreg PRE pass. >> > (record_last_mem_set_info): Ditto. >> > (compute_hash_table_work): Record hardreg uses. >> > (prune_expressions): Mark hardreg sets as call-clobbered. >> > (compute_pre_data): Add call to prune_hardreg_uses. >> > (pre_expr_reaches_here_p_work): Add comment. >> > (insert_insn_start_basic_block): New functions. >> > (pre_edge_insert): Don't add hardreg sets to predecessor block. >> > (pre_delete): Use hardreg for the reaching reg. >> > (pre_gcse): Don't insert copies for hardreg PRE. >> > (one_pre_gcse_pass): Disable load motion for hardreg PRE pass. >> > (execute_hardreg_pre): New. >> > (class pass_hardreg_pre): New. >> > (pass_hardreg_pre::gate): New. >> > (make_pass_hardreg_pre): New. >> > * passes.def (pass_hardreg_pre): New pass. >> > * tree-pass.h (make_pass_hardreg_pre): New. >> > >> > [...] >> > @@ -728,6 +752,37 @@ compute_local_properties (sbitmap *transp, sbitmap >> > *comp, sbitmap *antloc, >> > } >> > } >> > } >> > + >> > +/* A hardreg set is not transparent in a block if there are any uses of >> > that >> > + hardreg. This filters the results of compute_local_properties, after >> > the >> > + result of that function has been used to define the kills bitmap. >> >> I think this is mostly my ignorance of the code, and would be obvious >> if I tried it out locally, but: why do we need to do this after >> computing the kills bitmap? For mode-switching, the kills bitmap >> is the inverse of the transparency bitmap, but it sounds like here >> you want the kills bitmap to be more selective. > > I had to work through the entire LCM algorithm before I understood how these > bitmaps were being used (and I intend to update the documentation to make this > more obvious). In summary, the kills and avail bitmaps indicate whether the > result of an earlier expression is still available and up-to-date, whereas the > transparent and anticipatable bitmaps indicate whether a later assignment can > be moved earlier.
Right. That part is pretty standard. > For the existing hoist/PRE passes these are the same - this is because new > pseduoregs are used to hold the result of relocated computations, so the only > obstruction is if the values of the inputs to the expression are changed. > > For the new hardreg PRE pass the bitmaps are different in one case - if the > content of the hardreg is used, then the result of the expression remains > available after the use, but it isn't possible to anticipate a future > assignment by moving that assignment before the earlier use. But what I meant was: doesn't an assignment to the hard register block movement/reuse in both directions? We can't move R:=X up through a block B that requires R==Y (so X is not transparent in B). We also can't reuse R:=X after a block that requires R==Y (because B kills X). That's why I was expecting the kill set to be updated too, not just the transparency set. Thanks, Richard