On 11/20/23 11:26, Richard Sandiford wrote:
+
+/* If we know the destination of CODE only uses some low bits
+ (say just the QI bits of an SI operation), then return true
+ if we can propagate the need for just the subset of bits
+ from the destination to the sources. */
+
+static bool
+safe_for_live_propagation (rtx_code code)
+{
+ /* First handle rtx classes which as a whole are known to
+ be either safe or unsafe. */
+ switch (GET_RTX_CLASS (code))
+ {
+ case RTX_OBJ:
+ return true;
+
+ case RTX_COMPARE:
+ case RTX_COMM_COMPARE:
+ case RTX_TERNARY:
I suppose operands 1 and 2 of an IF_THEN_ELSE would be safe.
Yes. The only downside is we'd need to special case IF_THEN_ELSE
because it doesn't apply to operand 0. Right now we're pretty
conservative with anything other than binary codes. Comment added about
the possibility of handling I-T-E as well.
This made me wonder: is this safe for !TRULY_NOOP_TRUNCATION? But I
suppose it is. What !TRULY_NOOP_TRUNCATION models is that the target
mode has a canonical form that must be maintained, and wouldn't be by
a plain subreg. So TRULY_NOOP_TRUNCATION is more of an issue for
consumers of the liveness information, rather than the computing the
liveness information itself.
Really interesting question. I think ext-dce is safe. As you note this
is more a consumer side question and on the consumer side we don't muck
with TRUNCATE at all.
+ case SS_TRUNCATE:
+ case US_TRUNCATE:
+ case PLUS:
+ case MULT:
+ case SS_MULT:
+ case US_MULT:
+ case SMUL_HIGHPART:
+ case UMUL_HIGHPART:
+ case AND:
+ case IOR:
+ case XOR:
+ case SS_PLUS:
+ case US_PLUS:
I don't think it's safe to propagate through saturating ops.
They don't have the property that (x op y)%z == (x%z op x%z)%z
Yea, you're probably right. Removed.
+
+ /* We don't support vector destinations or destinations
+ wider than DImode. It is safe to continue this loop.
+ At worst, it will leave things live which could have
+ been made dead. */
+ if (VECTOR_MODE_P (GET_MODE (x)) || GET_MODE (x) > E_DImode)
+ continue;
The E_DImode comparison hard-codes an assumption about the order of
the mode enum. How about using something like:
Guilty as charged :-) Not surprised you called that out.
scalar_int_mode outer_mode;
if (!is_a<scalar_int_mode> (GET_MODE (x), &outer_mode)
|| GET_MODE_BITSIZE (outer_mode) > 64)
continue;
Wouldn't we also want to verify that the size is constant, or is it the
case that all the variable cases are vector (and would we want to
actually depend on that)?
The other continues use iter.skip_subrtxes (); when continuing.
I don't think it matters for correctness whether we do that or not,
since SETs and CLOBBERs shouldn't be nested. But skipping should
be faster.
My thought on not skipping the sub-rtxs in this case was to make sure we
processed things like memory addresses which could have embedded side
effects. It probably doesn't matter in practice though.
Maybe it would be worth splitting the SET/CLOBBER code out into > a
subfunction, to make the loop iteration easier to handle?
Yea, it could use another round of such work. In the originalm set and
use handling were one big function which drove me nuts.
+ /* Transfer all the LIVENOW bits for X into LIVE_TMP. */
+ HOST_WIDE_INT rn = REGNO (SUBREG_REG (x));
+ for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + 4; i++)
+ if (bitmap_bit_p (livenow, i))
+ bitmap_set_bit (live_tmp, i);
+
+ /* The mode of the SUBREG tells us how many bits we can
+ clear. */
+ machine_mode mode = GET_MODE (x);
+ HOST_WIDE_INT size = GET_MODE_SIZE (mode).to_constant ();
+ bitmap_clear_range (livenow, 4 * rn, size);
Is clearing SIZE bytes correct? Feels like it should be clearing
something like log2 (size) + 1 instead.
Yea, I think you're right. Fixed.
+ bit = SUBREG_BYTE (x).to_constant () * BITS_PER_UNIT;
+ if (WORDS_BIG_ENDIAN)
+ bit = (GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x))).to_constant
()
+ - GET_MODE_BITSIZE (GET_MODE (x)).to_constant () - bit);
+
+ /* Catch big endian correctness issues rather than triggering
+ undefined behavior. */
+ gcc_assert (bit < sizeof (HOST_WIDE_INT) * 8);
This could probably use subreg_lsb, to avoid the inline endianness adjustment.
That's the routine I was looking for! The original totally mucked up
the endianness adjustment and I kept thinking we must have an existing
routine to do this for us but didn't find it immediately, so I just
banged out a trivial endianness adjustment.
+
+ mask = GET_MODE_MASK (GET_MODE (SUBREG_REG (x))) << bit;
+ if (!mask)
+ mask = -0x100000000ULL;
Not sure I follow this. What does the -0x100000000ULL constant indicate?
Also, isn't it the mask of the outer register that is shifted, rather
than the mask of the inner mode? E.g. if we have:
Inherited. I should have marked it like the other one as needing
investigation. Probably the fastest way is to just rip it out for a
test to see what breaks.
+ /* Some ports generate (clobber (const_int)). */
+ else if (CONST_INT_P (x))
Ugh. What on earth do they do that for? I thought that (with const0_rtx)
was reserved as combine's "something went wrong" representation.
I stumbled over it on avr I think. I didn't dig into why other than to
note it occurred in an insn pattern as opposed to rtl templates for
generation from define_expands, splitters, etc.
+/* INSN has a sign/zero extended source inside SET that we will
+ try to turn into a SUBREG. */
+static void
+ext_dce_try_optimize_insn (rtx_insn *insn, rtx set, bitmap changed_pseudos)
+{
+ rtx src = SET_SRC (set);
+ rtx inner = XEXP (src, 0);
+
+ /* Avoid (subreg (mem)) and other constructs which may are valid RTL, but
s/are// or s/may are/may be/
Fixed. "may be".
+
+/* Process uses in INSN. Set appropriate bits in LIVENOW for any chunks of
+ pseudos that become live, potentially filtering using bits from LIVE_TMP.
+
+ If MODIFIED is true, then optimize sign/zero extensions to SUBREGs when
MODIFY
Fixed.
+
+ /* ?!? How much of this should mirror SET handling, potentially
+ being shared? */
+ if (SUBREG_BYTE (dst).is_constant () && SUBREG_P (dst))
+ {
+ bit = SUBREG_BYTE (dst).to_constant () * BITS_PER_UNIT;
+ if (WORDS_BIG_ENDIAN)
+ bit = (GET_MODE_BITSIZE (GET_MODE (SUBREG_REG
(dst))).to_constant ()
+ - GET_MODE_BITSIZE (GET_MODE (dst)).to_constant () -
bit);
Same subreg_lsb thing here.
Yea, I'll fix them all. There's 2 or 3 IIRC.
+
+ /* ?!? What is the point of this adjustment to DST_MASK? */
+ if (code == PLUS || code == MINUS
+ || code == MULT || code == ASHIFT)
+ dst_mask
+ = dst_mask ? ((2ULL << floor_log2 (dst_mask)) - 1) : 0;
Yeah, sympathise with the ?!? here :)
Inherited. Like the other bit of magic I think I'll do a test with them
pulled out to see if I can make something undesirable trigger.
+ /* We will handle the other operand of a binary operator
+ at the bottom of the loop by resetting Y. */
+ if (BINARY_P (src))
+ y = XEXP (src, 0);
What about UNARY_P, given that NOT is included in the codes above?
We'll break that inner for(;;) then iterate into the subobject, marking
the relevant bits live. FWIW, the control flow of this code continues
to be my biggest concern from a maintenance standpoint. Figuring it out
was a major pain and I've tried to document what is and what is not
safe. But it's still painful to walk through.
I pondered if note_uses/note_stores would be better, but concluded we'd
just end up with a ton of state objects to carry around and reasoning
about that would be just as hard.
+ else
+ y = src;
+
+ /* We're inside a SET and want to process the source operands
+ making things live. Breaking from this loop will cause
+ the iterator to work on sub-rtxs, so it is safe to break
+ if we see something we don't know how to handle. */
+ for (;;)
+ {
+ /* Strip an outer STRICT_LOW_PART or paradoxical subreg.
+ That has the effect of making the whole referenced
+ register live. We might be able to avoid that for
+ STRICT_LOW_PART at some point. */
+ if (GET_CODE (x) == STRICT_LOW_PART
+ || paradoxical_subreg_p (x))
+ x = XEXP (x, 0);
Isn't "x" still the set at this point? If it was intended to be "y",
then I thought STRICT_LOW_PART was for destinations only.
I'll have to go back through the history. Something definitely does not
look right here.
+ else if (!CONSTANT_P (y))
+ break;
+ /* We might have (ashift (const_int 1) (reg...)) */
+ else if (CONSTANT_P (y)
+ && binop_implies_op2_fully_live (GET_CODE (src)))
The CONSTANT_P (y) condition is redundant given the above.
But couldn't the binop_implies_op2_fully_live be shared by both paths?
Maybe it could be tested...
Yea, that was a fun little twisty maze. THe CONSTANT_P is definitely
reundant. I had a bunch of fixes related to shifts in the code
initially, but Jivan and I slowly cleaned up and refactored over time,
but I think there's still some work to do here.
+
+ do
+ {
+ qin = qout = worklist;
+
+ /* Put every block on the worklist. */
+ int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+ int n = inverted_rev_post_order_compute (cfun, rpo);
Does this need to be recalculated each iteration, or could it be done
outside the loop?
I think it can/should move outside the loop. Done.
Was also wondering if this could use a DF_BACKWARD df_simple_dataflow
(with its precomputed order), rather than doing the worklist management
manually. If df_simple_dataflow isn't efficient enough then we should
try to fix that...
I'll have to take a look. I actually never really had to look at this
code as all the fixes/cleanups were in the sets/uses processing side.
The liveness iteration and such "just worked".
Thanks a ton for all the feedback. I've got several of the fixes done
and I'll throw them into testing today and get a V2 out as a few people
are playing with this. V2 won't have fixes for everything you've
pointed out, so it probably won't be worth the time to review the V2
knowing a V3 is inevitable.
jeff