On 12/30/24 5:53 AM, Andrew Carlotti wrote:
On Sun, Dec 29, 2024 at 10:54:03AM -0700, Jeff Law wrote:
On 12/5/24 8:45 AM, Andrew Carlotti wrote:
So at a 30k foot level, one thing to be very leery of is extending the
lifetime of any hard register. It's probably not a big deal on aarch, but
it can cause all kinds of headaches on other targets.
Essentially you probably need to avoid PRE on a hard register that's in a
likely spilled class.
This is not intended to be used for ordinary registers, so that shouldn't be a
concern. The use case is essentially as a form of mode-switching, where the
active mode is specified by a register that can take arbitrary values at
runtime.
I don't recall the details of the patch, but essentially you're going to
need some kind of way to prune the set of hard registers subject to this
optimization -- which would need to include filtering out any hard register
that's in a likely spilled class.
By "in a likely splilled class", do you mean a register where register
allocation might end up splitting up the live range and spilling/reloading the
value? Or something else?
It's a bit more complex than that, but at a 30k foot level, yes. The
most common cases show up when there are register classes that have a
single member. If those register classes are heavily used, then they're
likely going to be spilled -- you're usually better off keeping
lifetimes short for those cases rather than doing something like CSE or
combining uses/sets and extending their lifetimes.
eax on x86 would be a great example, particularly since it's also the
return value. But there's others like %r1 on the PA which is used for
every symbolic memory address generation.
I think my pass handles this by only operating on a list of hardregs specified
by a new target hook. It would be up to the target hooks to ensure that the
optimisation isn't applied to inappropriate register classes.
Perfect.
In the hardreg PRE pass we don't need to check for clobbered memory, but we do
need to check whether the hardreg might be clobbered by a call. It seemed
sensible to reuse the existing suitably named bitmap to store this information,
but because I bypassed the existing computation, I needed to add the
computation back in elsewhere.
ACK. Note this all plays into the need to walk into the FUSAGE notes as
well, which I think this patch failed to do.
Does the use of DF analysis cover this, or are there additional checks still
required?
It appears that DF does the right thing. So it may not be an issue for you:
/* Record the registers used to pass arguments, and explicitly
noted as clobbered. */
for (note = CALL_INSN_FUNCTION_USAGE (insn_info->insn); note;
note = XEXP (note, 1))
{
if (GET_CODE (XEXP (note, 0)) == USE)
df_uses_record (collection_rec, &XEXP (XEXP (note, 0), 0),
DF_REF_REG_USE, bb, insn_info, flags);
else if (GET_CODE (XEXP (note, 0)) == CLOBBER)
{
if (REG_P (XEXP (XEXP (note, 0), 0)))
{
unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
if (!TEST_HARD_REG_BIT (defs_generated, regno))
df_defs_record (collection_rec, XEXP (note, 0), bb,
insn_info, flags);
}
else
df_uses_record (collection_rec, &XEXP (note, 0),
DF_REF_REG_USE, bb, insn_info, flags);
}
}
Jeff