On 17/04/15 16:48, Alan Lawrence wrote:
> From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64134, testcase
> 
> #define vector __attribute__((vector_size(16)))
> 
> float a; float b;
> vector float fb(void) { return (vector float){ 0,0,b,a};}
> 
> currently produces (correct, but suboptimal):
> 
> fb:
>         fmov    s0, wzr
>         adrp    x1, b
>         adrp    x0, a
>         sub     sp, sp, #16
>         ldr     w1, [x1, #:lo12:b]
>         ldr     w0, [x0, #:lo12:a]
>         stp     s0, s0, [sp]
>         stp     w1, w0, [sp, 8]
>         ldr     q0, [sp]
>         add     sp, sp, 16
>         ret
> 
> with this patch:
> 
> fb:
>         adrp    x1, b
>         movi    v0.4s, 0
>         adrp    x0, a
>         ldr     s2, [x1, #:lo12:b]
>         ldr     s1, [x0, #:lo12:a]
>         ins     v0.s[2], v2.s[0]
>         ins     v0.s[3], v1.s[0]
>         ret
> 
> The reason is that aarch64_expand_vector_init presently loads a constant
> and then overwrites with 'ins' only if exactly one element of the vector
> is variable; otherwise, it dumps the entire vector out to the stack
> (later changed to STP) and then loads the whole vector in. This patch
> changes behaviour to load constants and then 'ins' if at most half the
> elements are variable rather than only one.
> 
> AFAICT this code path is only used for initialization of GCC vector
> extension vectors, and there is already a special cases for all elements
> being the same (e.g. the _dup_ instrinsics). So it doesn't feel worth
> introducing a 'cost model'-type approach for this one use case (such
> would probably have to be based on an assumption about success of STP
> pattern later anyway). Instead this is a (relatively) simple heuristic
> improvement.
> 
> There is a possibility of using ld1 rather than ldr+ins, which *may*
> generate further improvement (probably requiring adrp+add due to limited
> addressing modes of ld1, however); this patch does not tackle that.
> 
> Tested on aarch64-none-elf.
> 
> gcc/ChangeLog:
> 
>     PR target/64134
>     config/aarch64/aarch64.c (aarch64_expand_vector_init): Load constant
>     and overwrite variable parts if <= 1/2 the elements are variable.
> 
> gcc/testsuite/ChangeLog:
> 
>     PR target/64134
>     gcc.target/aarch64/vec_init_1.c: New test.

OK.

R.

Reply via email to