On Thu, Oct 5, 2023 at 1:05 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 10/3/23 05:45, Manolis Tsamis wrote: > > This is a new RTL pass that tries to optimize memory offset calculations > > by moving them from add immediate instructions to the memory loads/stores. > > For example it can transform this: > > > > addi t4,sp,16 > > add t2,a6,t4 > > shl t3,t2,1 > > ld a2,0(t3) > > addi a2,1 > > sd a2,8(t2) > > > > into the following (one instruction less): > > > > add t2,a6,sp > > shl t3,t2,1 > > ld a2,32(t3) > > addi a2,1 > > sd a2,24(t2) > > > > Although there are places where this is done already, this pass is more > > powerful and can handle the more difficult cases that are currently not > > optimized. Also, it runs late enough and can optimize away unnecessary > > stack pointer calculations. > > > > gcc/ChangeLog: > > > > * Makefile.in: Add fold-mem-offsets.o. > > * passes.def: Schedule a new pass. > > * tree-pass.h (make_pass_fold_mem_offsets): Declare. > > * common.opt: New options. > > * doc/invoke.texi: Document new option. > > * fold-mem-offsets.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > > * gcc.target/riscv/fold-mem-offsets-3.c: New test. > > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu> > > > So I was ready to ACK, but realized there weren't any testresults for a > primary platform mentioned. So I ran this on x86. > > It's triggering one regression (code quality). > > Specifically gcc.target/i386/pr52146.c > > The f-m-o code is slightly worse than without f-m-o. > > Without f-m-o we get this: > > 9 0000 B88000E0 movl $-18874240, %eax > 9 FE > 10 0005 67C70000 movl $0, (%eax) > 10 000000 > 11 000c C3 ret > > With f-m-o we get this: > > 9 0000 B8000000 movl $0, %eax > 9 00 > 10 0005 67C78080 movl $0, -18874240(%eax) > 10 00E0FE00 > 10 000000 > 11 0010 C3 ret > > > The key being that we don't get rid of the original move instruction, > nor does the original move instruction get smaller due to simplification > of its constant. Additionally, the memory store gets larger. The net > is a 4 byte increase in code size. > Yes, this case is not good for f-m-o. In theory there could be a cost calculation step that tries to estimate the benefit of a transformation, but given that f-m-o cannot transform code in a way that we have big regressions it's unclear to me whether complicating the code is worth it. At least if we can solve the issues in other ways (also see discussion below).
> > This is probably a fairly rare scenario and the original bug report was > for a correctness issue in using addresses in the range > 0x80000000..0xffffffff in x32. So I wouldn't lose any sleep if we > adjusted the test to pass -fno-fold-mem-offsets. But before doing that > I wanted to give you the chance to ponder if this is something you'd > prefer to improve in f-m-o itself. At some level if the base register > collapses down to 0, then we could take the offset as a constant address > and try to recognize that form. If that fails, then just consider the > change unprofitable rather than trying to recognize it as reg+d. > > Anyway, waiting to hear your thoughts... > Yes, this testcase has been bugging me too, I have brought that up in previous iterations as well. I'm also not sure whether this is a code quality or a correctness issue? From what I understand from the relevant ticket, if we emit movl $0, -18874240 then it's wrong code? With regards to the "recognize that the base register is 0", that would be nice but how would we recognise that? f-m-o can only calculate the folded offset but that is not enough to prove that the base register is zero or not. One thought that I've had is that this started being an issue on x86 when I enabled folding of mv REG, INT in addition to the existing ADD REG, REG, INT. The idea was that a move will be folded to mv REG, 0 and on targets that we have a zero register that can be beneficial for a number of reasons... but on x86 we don't have a zero register so the benefit is much more limited anyway. So maybe we could disable folding of moves on targets that don't have a zero register? That would solve the issue and I believe it also makes sense in general. If so, is there a way to query wether the target has such register? Thoughts? > If we do a V7, then we need to fix one spelling issue that shows up in > several places (if we go with the v6 we can just fix it prior to > committing). Specifically in several places we need to replace > "recognised" with "recognized". > Ok, I can do that :) Manolis > > jeff