On Thu, Oct 5, 2023 at 1:05 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 10/3/23 05:45, Manolis Tsamis wrote:
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >    addi t4,sp,16
> >    add  t2,a6,t4
> >    shl  t3,t2,1
> >    ld   a2,0(t3)
> >    addi a2,1
> >    sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >    add  t2,a6,sp
> >    shl  t3,t2,1
> >    ld   a2,32(t3)
> >    addi a2,1
> >    sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > gcc/ChangeLog:
> >
> >       * Makefile.in: Add fold-mem-offsets.o.
> >       * passes.def: Schedule a new pass.
> >       * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> >       * common.opt: New options.
> >       * doc/invoke.texi: Document new option.
> >       * fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
>
>
> So I was ready to ACK, but realized there weren't any testresults for a
> primary platform mentioned.  So I ran this on x86.
>
> It's triggering one regression (code quality).
>
> Specifically gcc.target/i386/pr52146.c
>
> The f-m-o code is slightly worse than without f-m-o.
>
> Without f-m-o we get this:
>
>     9 0000 B88000E0              movl    $-18874240, %eax
>     9      FE
>    10 0005 67C70000              movl    $0, (%eax)
>    10      000000
>    11 000c C3                    ret
>
> With f-m-o we get this:
>
>     9 0000 B8000000              movl    $0, %eax
>     9      00
>    10 0005 67C78080              movl    $0, -18874240(%eax)
>    10      00E0FE00
>    10      000000
>    11 0010 C3                    ret
>
>
> The key being that we don't get rid of the original move instruction,
> nor does the original move instruction get smaller due to simplification
> of its constant.  Additionally, the memory store gets larger.  The net
> is a 4 byte increase in code size.
>
Yes, this case is not good for f-m-o. In theory there could be a cost
calculation step that tries to estimate the benefit of a
transformation, but given that f-m-o cannot transform code in a way
that we have big regressions it's unclear to me whether complicating
the code is worth it. At least if we can solve the issues in other
ways (also see discussion below).

>
> This is probably a fairly rare scenario and the original bug report was
> for a correctness issue in using addresses in the range
> 0x80000000..0xffffffff in x32.  So I wouldn't lose any sleep if we
> adjusted the test to pass -fno-fold-mem-offsets.  But before doing that
> I wanted to give you the chance to ponder if this is something you'd
> prefer to improve in f-m-o itself.   At some level if the base register
> collapses down to 0, then we could take the offset as a constant address
> and try to recognize that form.  If that fails, then just consider the
> change unprofitable rather than trying to recognize it as reg+d.
>
> Anyway, waiting to hear your thoughts...
>
Yes, this testcase has been bugging me too, I have brought that up in
previous iterations as well.
I'm also not sure whether this is a code quality or a correctness
issue? From what I understand from the relevant ticket, if we emit
movl $0, -18874240 then it's wrong code?

With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.

One thought that I've had is that this started being an issue on x86
when I enabled folding of mv REG, INT in addition to the existing ADD
REG, REG, INT. The idea was that a move will be folded to mv REG, 0
and on targets that we have a zero register that can be beneficial for
a number of reasons... but on x86 we don't have a zero register so the
benefit is much more limited anyway. So maybe we could disable folding
of moves on targets that don't have a zero register? That would solve
the issue and I believe it also makes sense in general. If so, is
there a way to query wether the target has such register?

Thoughts?

> If we do a V7, then we need to fix one spelling issue that shows up in
> several places (if we go with the v6 we can just fix it prior to
> committing).  Specifically in several places we need to replace
> "recognised" with "recognized".
>
Ok, I can do that :)

Manolis
>
> jeff

Reply via email to