On Sun, Dec 29, 2024 at 10:54:03AM -0700, Jeff Law wrote: > > > On 12/5/24 8:45 AM, Andrew Carlotti wrote: > > > > So at a 30k foot level, one thing to be very leery of is extending the > > > lifetime of any hard register. It's probably not a big deal on aarch, but > > > it can cause all kinds of headaches on other targets. > > > > > > Essentially you probably need to avoid PRE on a hard register that's in a > > > likely spilled class. > > > > This is not intended to be used for ordinary registers, so that shouldn't > > be a > > concern. The use case is essentially as a form of mode-switching, where the > > active mode is specified by a register that can take arbitrary values at > > runtime. > I don't recall the details of the patch, but essentially you're going to > need some kind of way to prune the set of hard registers subject to this > optimization -- which would need to include filtering out any hard register > that's in a likely spilled class.
By "in a likely splilled class", do you mean a register where register allocation might end up splitting up the live range and spilling/reloading the value? Or something else? I think my pass handles this by only operating on a list of hardregs specified by a new target hook. It would be up to the target hooks to ensure that the optimisation isn't applied to inappropriate register classes. > > > > In this new hardreg PRE pass we want to move the actual assignment to the > > hardreg, so we have to additionally check whether moving the assignment > > earlier > > (and not just moving the compuation earlier) would conflict with existing > > uses > > of the hardreg. The transparency (and antic) bitmaps are used to indicate > > this > > property in LCM; perhaps the name is less accurate when applied to hardreg > > PRE. > Understood. I think this is primarily a naming problem. > > > > > @@ -747,6 +804,9 @@ static basic_block current_bb; > > > > static bool > > > > want_to_gcse_p (rtx x, machine_mode mode, HOST_WIDE_INT > > > > *max_distance_ptr) > > > > { > > > > + if (doing_hardreg_pre_p) > > > > + return true; > > > This seems overly aggressive, though perhaps it's not so bad in practice. > > > > This should be fine - want_to_gcse_p is aiming to filter out cases where the > > lifetime of the newly-created pseduoregs would increase register to an > > extent > > that the resulting spills would outweigh the benefits of the code motion. > > > > For hardreg PRE we don't create new pseduoregs, but instead use specific > > hardregs that aren't used by RA, and we check for conflicting uses in > > determining the extent of the code motion. So the checks in want_to_gcse_p > > are > > irrelevant here. > want_to_gcse_p was originally meant to filter out things we didn't know how > to handle or which were seen as not profitable to gcse, ever. Register > lifetimes didn't come into play until much later. Right - I realised there were still some cases that should be filtered out for hardreg PRE, so I changed this for the v2 patch. (https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671801.html) > > > > > > @@ -1537,6 +1623,18 @@ compute_hash_table_work (struct > > > > gcse_hash_table_d *table) > > > > EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, > > > > regno, hrsi) > > > > record_last_reg_set_info (insn, regno); > > > > + if (doing_hardreg_pre_p) > > > > + { > > > > + /* This is covered by the above clobbers, but let's > > > > + conservatively make this work as well for hardregs > > > > that > > > > + are call-used but not call-clobbered. */ > > > > + record_last_reg_set_info (insn, > > > > current_hardreg_regno); > > > > + > > > > + /* Mark this block as containing a call-clobber. */ > > > > + bitmap_set_bit (blocks_with_calls, > > > > + BLOCK_FOR_INSN (insn)->index); > > > > + > > > I don't think this is wrong, but isn't it redundant? ie, shouldn't we > > > have > > > populated that bitmap elsewhere? > > > > The existing code only uses blocks_with_calls for load motion, to indicate > > when > > memory might be clobbered by a call. The existing computation of > > blocks_with_calls is in > > record_last_mem_set_info_common, but I bypass this with an added early exit > > condition in record_last_mem_set_info (in the same way that I bypass all the > > other code specific to load motion). > If I remember correctly, it's also used for loads outside the load/store > motion variant of PRE. ie, PRE of loads (without trying to do any motion of > stores). I've dropped the reuse of blocks_with_calls from v2 of the patch anyway, because I realised that DF analysis can already detect call-clobbered registers. > > > > In the hardreg PRE pass we don't need to check for clobbered memory, but we > > do > > need to check whether the hardreg might be clobbered by a call. It seemed > > sensible to reuse the existing suitably named bitmap to store this > > information, > > but because I bypassed the existing computation, I needed to add the > > computation back in elsewhere. > ACK. Note this all plays into the need to walk into the FUSAGE notes as > well, which I think this patch failed to do. Does the use of DF analysis cover this, or are there additional checks still required? > Jeff