Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Jeff Law Wed, 04 Oct 2023 15:05:49 -0700



On 10/3/23 05:45, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

        * Makefile.in: Add fold-mem-offsets.o.
        * passes.def: Schedule a new pass.
        * tree-pass.h (make_pass_fold_mem_offsets): Declare.
        * common.opt: New options.
        * doc/invoke.texi: Document new option.
        * fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/fold-mem-offsets-1.c: New test.
        * gcc.target/riscv/fold-mem-offsets-2.c: New test.
        * gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>

So I was ready to ACK, but realized there weren't any testresults for aprimary platform mentioned. So I ran this on x86.


It's triggering one regression (code quality).

Specifically gcc.target/i386/pr52146.c

The f-m-o code is slightly worse than without f-m-o.

Without f-m-o we get this:

   9 0000 B88000E0              movl    $-18874240, %eax
   9      FE
  10 0005 67C70000              movl    $0, (%eax)
  10      000000
  11 000c C3                    ret

With f-m-o we get this:

   9 0000 B8000000              movl    $0, %eax
   9      00
  10 0005 67C78080              movl    $0, -18874240(%eax)
  10      00E0FE00
  10      000000
  11 0010 C3                    ret

The key being that we don't get rid of the original move instruction,nor does the original move instruction get smaller due to simplificationof its constant. Additionally, the memory store gets larger. The netis a 4 byte increase in code size.

This is probably a fairly rare scenario and the original bug report wasfor a correctness issue in using addresses in the range0x80000000..0xffffffff in x32. So I wouldn't lose any sleep if weadjusted the test to pass -fno-fold-mem-offsets. But before doing thatI wanted to give you the chance to ponder if this is something you'dprefer to improve in f-m-o itself. At some level if the base registercollapses down to 0, then we could take the offset as a constant addressand try to recognize that form. If that fails, then just consider thechange unprofitable rather than trying to recognize it as reg+d.


Anyway, waiting to hear your thoughts...

If we do a V7, then we need to fix one spelling issue that shows up inseveral places (if we go with the v6 we can just fix it prior tocommitting). Specifically in several places we need to replace"recognised" with "recognized".



jeff

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to