https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048
--- Comment #12 from Wilco <wdijkstr at arm dot com> --- (In reply to Jiong Wang from comment #11) > (In reply to Richard Henderson from comment #10) > > Created attachment 37890 [details] > > second patch > > > > Still going through full testing, but I wanted to post this > > before the end of the day. > > > > This update includes a virt_or_elim_regno_p, as discussed in #c7/#c8. > > > > It also updates aarch64_legitimize_address to treat R0+R1+C as a special > > case of R0+(R1*S)+C. All of the arguments wrt scaling apply to unscaled > > indices as well. > > > > As a minor point, doing some of the expansion in a slightly different > > order results in less garbage rtl being generated in the process. > > Richard, > > I just recalled the reassociation of constant offset with vritual frame > pointer will increase register pressure, thus cause bad code generation > under some situations. For example, the testcase given at > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173#c8 > > void bar(int i) > { > char A[10]; > char B[10]; > char C[10]; > g(A); > g(B); > g(C); > f(A[i]); > f(B[i]); > f(C[i]); > return; > } > > Before your patch we are generating (-O2) > === > bar: > stp x29, x30, [sp, -80]! > add x29, sp, 0 > add x1, x29, 80 > str x19, [sp, 16] > mov w19, w0 > add x0, x29, 32 > add x19, x1, x19, sxtw > bl g > add x0, x29, 48 > bl g > add x0, x29, 64 > bl g > ldrb w0, [x19, -48] > bl f > ldrb w0, [x19, -32] > bl f > ldrb w0, [x19, -16] > bl f > ldr x19, [sp, 16] > ldp x29, x30, [sp], 80 > ret > > After your patch, we are generating: > === > bar: > stp x29, x30, [sp, -96]! > add x29, sp, 0 > stp x21, x22, [sp, 32] > add x22, x29, 48 > stp x19, x20, [sp, 16] > mov w19, w0 > mov x0, x22 > add x21, x29, 64 > add x20, x29, 80 > bl g > mov x0, x21 > bl g > mov x0, x20 > bl g > ldrb w0, [x22, w19, sxtw] > bl f > ldrb w0, [x21, w19, sxtw] > bl f > ldrb w0, [x20, w19, sxtw] > bl f > ldp x19, x20, [sp, 16] > ldp x21, x22, [sp, 32] > ldp x29, x30, [sp], 96 > ret > > We are using more callee saved registers, thus extra stp/ldp generated. > > But we do will benefit from reassociation constant offset with virtual > frame pointer if it's inside loop, because: > > * vfp + const_offset is loop invariant > * the virtual reg elimination on vfp will eventually generate one > extra instruction if it was not used with const_offset but another reg. > > Thus after this reassociation, rtl IVOPT can hoist it out of loop, and we > will save two instructions in the loop. > > A fix was proposed for loop-invariant.c to only do such reshuffling for > loop, see https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01253.html. That > patch finally stopped because the issue PR62173 was fixed on tree level, and > the pointer re-shuffling was considered to have hidding overflow risk though > will be very rare. I don't believe this is really worse - if we had say the same example with 3 pointers or 3 global arrays we should get the exact same code (and in fact generating the same canonicalized form for different bases and scales is essential). Once you've done that you can try optimizing accesses which differ by a *small* constant offset.