https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
Bug ID: 118984
Summary: Unnecessary instructions are emitted when addition
terms are in an unfortunate order
Product: gcc
Version: 14.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: maxim.yegorushkin at gmail dot com
Target Milestone: ---
Unnecessary instructions are emitted when addition terms are in an unfortunate
order.
The following two functions:
```
#include <immintrin.h>
__v2di hsum2a(__v4di v4) noexcept {
__v2di v2 = _mm256_extracti128_si256(v4, 1) + _mm256_castsi256_si128(v4);
return v2 + _mm_shuffle_epi32(v2, 0b1110);
}
__v2di hsum2b(__v4di v4) noexcept {
__v2di v2 = _mm256_castsi256_si128(v4) + _mm256_extracti128_si256(v4, 1);
return v2 + _mm_shuffle_epi32(v2, 0b1110);
}
```
When compiled with `gcc-14.2 -std=c++20 -pthread -Wall -Wextra -Werror
-march=znver3 -O2 -mtune=znver3` produce code with different number of
instructions:
```
hsum2a(long long vector[4]):
vextracti128 xmm1, ymm0, 0x1
vpaddq xmm0, xmm1, xmm0
vpshufd xmm1, xmm0, 14
vpaddq xmm0, xmm0, xmm1
ret
hsum2b(long long vector[4]):
vmovdqa xmm1, xmm0
vextracti128 xmm0, ymm0, 0x1
vpaddq xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 14
vpaddq xmm0, xmm0, xmm1
ret
```
That extra `vmovdqa` instruction in `hsum2b` is unnecessary and shouldn't be
there. Why does gcc emit it, please?
`clang` produces identical code with no superfluous instructions for both
functions: https://godbolt.org/z/dzd69oz4W