http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
Bug #: 54855 Summary: Unnecessary duplication when performing scalar operation on vector element Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: drepper....@gmail.com Take the following code: #include <stdio.h> typedef double v2df __attribute__((vector_size(16))); int main(int argc, char *argv[]) { v2df v = { 2.0, 2.0 }; v2df v2 = { 2.0, 2.0 }; while (argc-- > 1) { v[0] -= 1.0; v *= v2; } printf("%g\n", v[0] + v[1]); return 0; } It compiles as C and C++, both compilers behave the same. When compiling on x86-64 (therefore with SSE enabled) it generates for the loop this code: 4003f0: 66 0f 28 c1 movapd %xmm1,%xmm0 4003f4: 83 e8 01 sub $0x1,%eax 4003f7: f2 0f 5c c2 subsd %xmm2,%xmm0 4003fb: f2 0f 10 c8 movsd %xmm0,%xmm1 4003ff: 66 0f 58 c9 addpd %xmm1,%xmm1 400403: 75 eb jne 4003f0 <main+0x20> I.e., the value is pulled out of the vector, the subtraction is performed, and then the scalar value is put back into the vector. Instead the following sequence would have been completely sufficient: sub $0x1,%eax subsd %xmm2,%xmm1 addpd %xmm1,%xmm1 jne ...back The subsd instruction doesn't touch the high parts of the register. I know this is a special case, it only works if the scalar operation is for the element zero of the vector. But code can be designed like that. I have some code which would work nicely like this. I don't know whether this translates to other architectures as well.