https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61301
Bug ID: 61301 Summary: missed optimization of move if vector passed by reference Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch in the following test shuffle2 generates not optimized moves. the other two are ok. the problem occurs in real life when the vector is a data member of a class and the function is a method, as in "foo" typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t; typedef int __attribute__( ( vector_size( 16 ) ) ) int32x4_t; float32x4_t shuffle1(float32x4_t x) { return float32x4_t{x[1],x[0],x[3],x[2]}; } float32x4_t shuffle2(float32x4_t const & x) { return float32x4_t{x[1],x[0],x[3],x[2]}; } float32x4_t shuffle3(float32x4_t const & x) { return __builtin_shuffle(x,int32x4_t{1,0,3,2}); } struct foo { float32x4_t x; float32x4_t shuffle2() const; float32x4_t shuffle3() const; }; float32x4_t foo::shuffle2() const { return float32x4_t{x[1],x[0],x[3],x[2]}; } float32x4_t foo::shuffle3() const { return __builtin_shuffle(x,int32x4_t{1,0,3,2}); } compiled with c++ -std=c++11 -Ofast -march=nehalem -S shuffle.cc; cat shuffle.s generates: __Z8shuffle1U8__vectorf: LFB0: shufps $177, %xmm0, %xmm0 ret __Z8shuffle2RKU8__vectorf: LFB1: movss 12(%rdi), %xmm1 insertps $0x10, 8(%rdi), %xmm1 movss 4(%rdi), %xmm0 insertps $0x10, (%rdi), %xmm0 movlhps %xmm1, %xmm0 ret __Z8shuffle3RKU8__vectorf: LFB2: movaps (%rdi), %xmm0 shufps $177, %xmm0, %xmm0 ret