[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

wdijkstr at arm dot com Mon, 07 Mar 2016 15:46:33 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048


--- Comment #12 from Wilco <wdijkstr at arm dot com> ---
(In reply to Jiong Wang from comment #11)
> (In reply to Richard Henderson from comment #10)
> > Created attachment 37890 [details]
> > second patch
> > 
> > Still going through full testing, but I wanted to post this
> > before the end of the day.
> > 
> > This update includes a virt_or_elim_regno_p, as discussed in #c7/#c8.
> > 
> > It also updates aarch64_legitimize_address to treat R0+R1+C as a special
> > case of R0+(R1*S)+C.  All of the arguments wrt scaling apply to unscaled
> > indices as well.
> > 
> > As a minor point, doing some of the expansion in a slightly different
> > order results in less garbage rtl being generated in the process.
> 
> Richard,
> 
>   I just recalled the reassociation of constant offset with vritual frame
> pointer will increase register pressure, thus cause bad code generation
> under some situations. For example, the testcase given at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173#c8
> 
> void bar(int i)              
> {                        
>   char A[10];
>   char B[10];     
>   char C[10];       
>   g(A);                        
>   g(B);
>   g(C);                              
>   f(A[i]);                 
>   f(B[i]);                        
>   f(C[i]);                   
>   return;               
> } 
> 
>   Before your patch we are generating  (-O2)
>   ===
> bar:
>         stp     x29, x30, [sp, -80]!
>         add     x29, sp, 0
>         add     x1, x29, 80
>         str     x19, [sp, 16]
>         mov     w19, w0
>         add     x0, x29, 32
>         add     x19, x1, x19, sxtw
>         bl      g
>         add     x0, x29, 48
>         bl      g
>         add     x0, x29, 64
>         bl      g
>         ldrb    w0, [x19, -48]
>         bl      f
>         ldrb    w0, [x19, -32]
>         bl      f
>         ldrb    w0, [x19, -16]
>         bl      f
>         ldr     x19, [sp, 16]
>         ldp     x29, x30, [sp], 80
>         ret
> 
>   After your patch, we are generating:
>   ===
> bar:
>         stp     x29, x30, [sp, -96]!
>         add     x29, sp, 0
>         stp     x21, x22, [sp, 32]
>         add     x22, x29, 48
>         stp     x19, x20, [sp, 16]
>         mov     w19, w0
>         mov     x0, x22
>         add     x21, x29, 64
>         add     x20, x29, 80
>         bl      g
>         mov     x0, x21
>         bl      g
>         mov     x0, x20
>         bl      g
>         ldrb    w0, [x22, w19, sxtw]
>         bl      f
>         ldrb    w0, [x21, w19, sxtw]
>         bl      f
>         ldrb    w0, [x20, w19, sxtw]
>         bl      f
>         ldp     x19, x20, [sp, 16]
>         ldp     x21, x22, [sp, 32]
>         ldp     x29, x30, [sp], 96
>         ret
> 
>   We are using more callee saved registers, thus extra stp/ldp generated.
> 
>   But we do will benefit from reassociation constant offset with virtual
> frame pointer if it's inside loop, because:
> 
>    * vfp + const_offset is loop invariant
>    * the virtual reg elimination on vfp will eventually generate one
>      extra instruction if it was not used with const_offset but another reg.
> 
>   Thus after this reassociation, rtl IVOPT can hoist it out of loop, and we
> will save two instructions in the loop. 
> 
>   A fix was proposed for loop-invariant.c to only do such reshuffling for
> loop, see https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01253.html.  That
> patch finally stopped because the issue PR62173 was fixed on tree level, and
> the pointer re-shuffling was considered to have hidding overflow risk though
> will be very rare.

I don't believe this is really worse - if we had say the same example with 3
pointers or 3 global arrays we should get the exact same code (and in fact
generating the same canonicalized form for different bases and scales is
essential). Once you've done that you can try optimizing accesses which differ
by a *small* constant offset.

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

Reply via email to