On Tue, 2016-10-04 at 12:53 +0000, Wilco Dijkstra wrote:
> GCC currently doesn't canonicalize address expressions. As a result
> inefficient code is generated even for trivial index address
> expressions,
> blocking CSE and other optimizations:
> 
> int f(int *p, int i) { return p[i+2] + p[i+1]; }
> 
>       sxtw    x1, w1
>       add     x1, x1, 2
>       add     x2, x0, x1, lsl 2
>       ldr     w0, [x0, x1, lsl 2]
>       ldr     w1, [x2, -4]
>       add     w0, w1, w0
>       ret
> 
> After this patch:
> 
>       add     x1, x0, x1, sxtw 2
>       ldp     w0, w2, [x1, 4]
>       add     w0, w2, w0
>       ret
> 
> The reason for this is that array index expressions are preferably
> kept in the *(p + (i + C0) * C1) form eventhough it is best on most
> targets to make use of an offset in memory accesses - ie. *(p + i *
> C1 + (C0*C1)).
> 
> This patch disables the folding in fold_plusminus_mult_expr that
> changes the latter form into the former.  Unfortunately it isn't
> possible to know it is an address expression, and neither is there a
> way to decide when C0*C1 is too complex. 
> 
> So is there a better way/place to do this, or do we need an address
> canonicalization phase in the tree that ensures we expand addresses
> in an efficient manner, taking into account target offsets?

There's been an effort to implement address mode selection (AMS)
optimization in GCC as part of the GSoC program.  However, it hasn't
been mainlined yet and it's for SH only, but I'd like to move that
forward and make it available to other backends, too.

It's an RTL pass and works by analyzing memory accesses inside basic
blocks, figuring out the effective address expressions, querying the
backend for address mode alternatives for each memory access and the
associated costs.  With that information it tries to find a minimal
solution (minimizing address register calculations and minimizing
address mode alternative costs), which is currently implemented with
backtracking.

For SH, the AMS pass can convert your example above from this

_f:
        mov     r5,r0
        add     #2,r0
        shll2   r0
        mov     r4,r1
        add     r0,r1
        mov.l   @(r0,r4),r0
        add     #-4,r1
        mov.l   @r1,r2
        rts     
        add     r2,r0

into this:
_f:
        shll2   r5
        add     r5,r4
        mov.l   @(4,r4),r0
        mov.l   @(8,r4),r1
        rts     
        add     r1,r0

.. which is minimal on SH.


It also fixes several missed auto-inc opportunities and was meant to
allow further address mode related optimizations like displacement
range fitting or access reordering.

Although not yet ready for mainline, the current code can be found on
github:

https://github.com/erikvarga/gcc/commits/master

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

The way AMS gets address mode information from the backend is different
to GCC's current approach:

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11946

Since the SH ISA is a bit irregular, there are a bunch of exceptions
and special cases in the cost calculations which take surrounding insns
and mem accesses into account.  I guess a more regular or less
restrictive ISA wouldn't need too many special cases.


Cheers,
Oleg

Reply via email to