https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91526
Bug ID: 91526
Summary: Unnecessary SSE and other instructions generated when
compiling in C mode (vs. C++ mode)
Product: gcc
Version: 9.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: warp at iki dot fi
Target Milestone: ---
Consider the following piece of code:
//--------------------------------------------------------------
struct Vec { float v[8]; };
struct Vec multiply(const struct Vec* v1, const struct Vec* v2)
{
struct Vec result;
for(unsigned i = 0; i < 8; ++i)
result.v[i] = v1->v[i] * v2->v[i];
return result;
}
//--------------------------------------------------------------
If this is compiled as C++, using g++ 9.2 with options -Ofast -march=skylake,
the following result is produced:
_Z8multiplyPK3VecS1_:
vmovups ymm0, YMMWORD PTR [rdx]
mov rax, rdi
vmulps ymm0, ymm0, YMMWORD PTR [rsi]
vmovups YMMWORD PTR [rdi], ymm0
vzeroupper
ret
However, if it's compiled as C, using the same options, this is produced:
multiply:
push rbp
mov rax, rdi
mov rbp, rsp
and rsp, -32
vmovups ymm0, YMMWORD PTR [rdx]
vmulps ymm0, ymm0, YMMWORD PTR [rsi]
vmovaps YMMWORD PTR [rsp-32], ymm0
vmovdqa xmm2, XMMWORD PTR [rsp-16]
vmovups XMMWORD PTR [rdi], xmm0
vmovups XMMWORD PTR [rdi+16], xmm2
vzeroupper
leave
ret
Not only are extra instructions surrounding the code, but moreover the
assignment of the result into [rdi] has for some reason been split into two
parts.
Both clang and icc produce the same result (very similar to the first result
above) regardless of whether compiling as C or C++.