On 10/3/23 05:45, Manolis Tsamis wrote:
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:
addi t4,sp,16
add t2,a6,t4
shl t3,t2,1
ld a2,0(t3)
addi a2,1
sd a2,8(t2)
into the following (one instruction less):
add t2,a6,sp
shl t3,t2,1
ld a2,32(t3)
addi a2,1
sd a2,24(t2)
Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.
gcc/ChangeLog:
* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.
Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
So I was ready to ACK, but realized there weren't any testresults for a
primary platform mentioned. So I ran this on x86.
It's triggering one regression (code quality).
Specifically gcc.target/i386/pr52146.c
The f-m-o code is slightly worse than without f-m-o.
Without f-m-o we get this:
9 0000 B88000E0 movl $-18874240, %eax
9 FE
10 0005 67C70000 movl $0, (%eax)
10 000000
11 000c C3 ret
With f-m-o we get this:
9 0000 B8000000 movl $0, %eax
9 00
10 0005 67C78080 movl $0, -18874240(%eax)
10 00E0FE00
10 000000
11 0010 C3 ret
The key being that we don't get rid of the original move instruction,
nor does the original move instruction get smaller due to simplification
of its constant. Additionally, the memory store gets larger. The net
is a 4 byte increase in code size.
This is probably a fairly rare scenario and the original bug report was
for a correctness issue in using addresses in the range
0x80000000..0xffffffff in x32. So I wouldn't lose any sleep if we
adjusted the test to pass -fno-fold-mem-offsets. But before doing that
I wanted to give you the chance to ponder if this is something you'd
prefer to improve in f-m-o itself. At some level if the base register
collapses down to 0, then we could take the offset as a constant address
and try to recognize that form. If that fails, then just consider the
change unprofitable rather than trying to recognize it as reg+d.
Anyway, waiting to hear your thoughts...
If we do a V7, then we need to fix one spelling issue that shows up in
several places (if we go with the v6 we can just fix it prior to
committing). Specifically in several places we need to replace
"recognised" with "recognized".
jeff