http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
Bug #: 54855
Summary: Unnecessary duplication when performing scalar
operation on vector element
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: [email protected]
ReportedBy: [email protected]
Take the following code:
#include <stdio.h>
typedef double v2df __attribute__((vector_size(16)));
int
main(int argc, char *argv[])
{
v2df v = { 2.0, 2.0 };
v2df v2 = { 2.0, 2.0 };
while (argc-- > 1)
{
v[0] -= 1.0;
v *= v2;
}
printf("%g\n", v[0] + v[1]);
return 0;
}
It compiles as C and C++, both compilers behave the same.
When compiling on x86-64 (therefore with SSE enabled) it generates for the loop
this code:
4003f0: 66 0f 28 c1 movapd %xmm1,%xmm0
4003f4: 83 e8 01 sub $0x1,%eax
4003f7: f2 0f 5c c2 subsd %xmm2,%xmm0
4003fb: f2 0f 10 c8 movsd %xmm0,%xmm1
4003ff: 66 0f 58 c9 addpd %xmm1,%xmm1
400403: 75 eb jne 4003f0 <main+0x20>
I.e., the value is pulled out of the vector, the subtraction is performed, and
then the scalar value is put back into the vector.
Instead the following sequence would have been completely sufficient:
sub $0x1,%eax
subsd %xmm2,%xmm1
addpd %xmm1,%xmm1
jne ...back
The subsd instruction doesn't touch the high parts of the register.
I know this is a special case, it only works if the scalar operation is for the
element zero of the vector. But code can be designed like that. I have some
code which would work nicely like this. I don't know whether this translates
to other architectures as well.