On 10/3/23 05:45, Manolis Tsamis wrote:
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

        * Makefile.in: Add fold-mem-offsets.o.
        * passes.def: Schedule a new pass.
        * tree-pass.h (make_pass_fold_mem_offsets): Declare.
        * common.opt: New options.
        * doc/invoke.texi: Document new option.
        * fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/fold-mem-offsets-1.c: New test.
        * gcc.target/riscv/fold-mem-offsets-2.c: New test.
        * gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>


So I was ready to ACK, but realized there weren't any testresults for a primary platform mentioned. So I ran this on x86.

It's triggering one regression (code quality).

Specifically gcc.target/i386/pr52146.c

The f-m-o code is slightly worse than without f-m-o.

Without f-m-o we get this:

   9 0000 B88000E0              movl    $-18874240, %eax
   9      FE
  10 0005 67C70000              movl    $0, (%eax)
  10      000000
  11 000c C3                    ret

With f-m-o we get this:

   9 0000 B8000000              movl    $0, %eax
   9      00
  10 0005 67C78080              movl    $0, -18874240(%eax)
  10      00E0FE00
  10      000000
  11 0010 C3                    ret


The key being that we don't get rid of the original move instruction, nor does the original move instruction get smaller due to simplification of its constant. Additionally, the memory store gets larger. The net is a 4 byte increase in code size.


This is probably a fairly rare scenario and the original bug report was for a correctness issue in using addresses in the range 0x80000000..0xffffffff in x32. So I wouldn't lose any sleep if we adjusted the test to pass -fno-fold-mem-offsets. But before doing that I wanted to give you the chance to ponder if this is something you'd prefer to improve in f-m-o itself. At some level if the base register collapses down to 0, then we could take the offset as a constant address and try to recognize that form. If that fails, then just consider the change unprofitable rather than trying to recognize it as reg+d.

Anyway, waiting to hear your thoughts...

If we do a V7, then we need to fix one spelling issue that shows up in several places (if we go with the v6 we can just fix it prior to committing). Specifically in several places we need to replace "recognised" with "recognized".


jeff

Reply via email to