Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Manolis Tsamis Mon, 16 Oct 2023 11:13:14 -0700

On Tue, Oct 10, 2023 at 5:15 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 10/10/23 05:59, Manolis Tsamis wrote:
>
> >> It's a code quality issue as long as we don't transform the code into
> >> movl $0, -18874240, at which point it would become a correctness issue.
> >>
> > Ok, thanks for pointing that out as I thought that movl $0, -18874240
> > and movl $0, -18874240(eax) with eax == 0 were the same.
> It's a quirk of x32 related to sign extension of the address.  Certainly
> not obvious!  But it's better than the PA where:
>
> (mem (plus (reg A) (reg B)))
>
> Is semantically different than
>
> (mem (plus (reg B) (reg A)))
>
>
> Due to the implicit segment register selection that happens on the high
> bits of the base register rather than the high bits of the effective
> address.
>
I see, thanks for the additional info


>
> >
> >>
> >>>
> >>> With regards to the "recognize that the base register is 0", that
> >>> would be nice but how would we recognise that? f-m-o can only
> >>> calculate the folded offset but that is not enough to prove that the
> >>> base register is zero or not.
> >> It's a chain of insns that produce an address and use it in the memory
> >> reference.  We essentially changed the first insn in the chain from movl
> >> -18874240, %eax into movl 0, %eax.  So we'd have to someone note that
> >> the base register in the memory reference has the value zero in the
> >> chain of instructions.  That may or may not be reasonable to do.
> >>
> > Ok, do you believe it would be worthwhile to add a REG_EQ note or
> > something similar? I assume some REG_EQ notes will be added from the
> > following cprop-hardreg pass too?
> I think it really depends on the degree of difficulty and whether or not
> we think there are other cases like this one lurking.
>
> Given this scenario can only happen with an absolute address, I would
> probably explore if there's a way to trivially detect this case, but I
> wouldn't spend a lot of time on it.
>
> And I wasn't necessarily thinking of a note in the RTL, just a bit of
> state within f-m-o when updating an arithmetic chain so that we could
> determine that the final instruction was a memory reference to an
> absolute address.
>
Ok, although I didn't include this in v7, I'll do some exploration for
a possible future change.
>
>
>
> >
> >> You could use the costing model to cost the entire sequence
> >> before/after.  There's an interface to walk a sequence and return a
> >> cost.  In the case of f-m-o the insns are part of the larger chain, so
> >> we'd need a different API.
> >>
> >> The other option would be to declare this is known, but not important
> >> issue.  I would think that it's going to be rare to have absolute
> >> addresses and x32 isn't really used much.  The combination of the two
> >> should be exceedingly rare.  Hence my willingness to use
> >> -fno-fold-mem-offsets in the test.
> >>
> > Yes, I now better understand why you suggested adding
> > -fno-fold-mem-offsets to the testcase. Or we could also make the test
> > not fail in this case where the memory access has the base register,
> > as there's no correctness issue.
> :)
>
>
> >
> > Then, going back to the code quality regression, wouldn't things be
> > much better if we would emit xor eax, eax instead of mov eax, 0? In
> > that case xor eax, eax should be 2 bytes instead of 5 for mov eax, 0
> > and hence the code size difference should be trivial. Then we can keep
> > f-m-o as is, including this testcase.
> The selection of xor vs mov imm is handled elsewhere.  On some uarchs
> the former is preferred, on others the latter.
>
> If you think about xor %eax,%eax as an example.  Unless you special case
> that scenario in your cpu front-end, the xor will have a data dependency
> on any prior set of %eax which inhibits ILP.
>
>
> >
> > Is there a way to emit a taret-specific optimized register zero insn?
> > If so I'll use that and adjust the testcase as needed, and I think
> > we're done with this one.
> I can live with just adjusting the testcase.  Let's go with that.
>
Done! Sent as v7 together with the recognised -> recognized change and
INSN_CODE(insn) = -1 for the bug you reported.

Thanks,
Manolis
> jeff

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to