On 17/04/15 16:48, Alan Lawrence wrote: > From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64134, testcase > > #define vector __attribute__((vector_size(16))) > > float a; float b; > vector float fb(void) { return (vector float){ 0,0,b,a};} > > currently produces (correct, but suboptimal): > > fb: > fmov s0, wzr > adrp x1, b > adrp x0, a > sub sp, sp, #16 > ldr w1, [x1, #:lo12:b] > ldr w0, [x0, #:lo12:a] > stp s0, s0, [sp] > stp w1, w0, [sp, 8] > ldr q0, [sp] > add sp, sp, 16 > ret > > with this patch: > > fb: > adrp x1, b > movi v0.4s, 0 > adrp x0, a > ldr s2, [x1, #:lo12:b] > ldr s1, [x0, #:lo12:a] > ins v0.s[2], v2.s[0] > ins v0.s[3], v1.s[0] > ret > > The reason is that aarch64_expand_vector_init presently loads a constant > and then overwrites with 'ins' only if exactly one element of the vector > is variable; otherwise, it dumps the entire vector out to the stack > (later changed to STP) and then loads the whole vector in. This patch > changes behaviour to load constants and then 'ins' if at most half the > elements are variable rather than only one. > > AFAICT this code path is only used for initialization of GCC vector > extension vectors, and there is already a special cases for all elements > being the same (e.g. the _dup_ instrinsics). So it doesn't feel worth > introducing a 'cost model'-type approach for this one use case (such > would probably have to be based on an assumption about success of STP > pattern later anyway). Instead this is a (relatively) simple heuristic > improvement. > > There is a possibility of using ld1 rather than ldr+ins, which *may* > generate further improvement (probably requiring adrp+add due to limited > addressing modes of ld1, however); this patch does not tackle that. > > Tested on aarch64-none-elf. > > gcc/ChangeLog: > > PR target/64134 > config/aarch64/aarch64.c (aarch64_expand_vector_init): Load constant > and overwrite variable parts if <= 1/2 the elements are variable. > > gcc/testsuite/ChangeLog: > > PR target/64134 > gcc.target/aarch64/vec_init_1.c: New test.
OK. R.