https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
Bug ID: 118984 Summary: Unnecessary instructions are emitted when addition terms are in an unfortunate order Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: maxim.yegorushkin at gmail dot com Target Milestone: --- Unnecessary instructions are emitted when addition terms are in an unfortunate order. The following two functions: ``` #include <immintrin.h> __v2di hsum2a(__v4di v4) noexcept { __v2di v2 = _mm256_extracti128_si256(v4, 1) + _mm256_castsi256_si128(v4); return v2 + _mm_shuffle_epi32(v2, 0b1110); } __v2di hsum2b(__v4di v4) noexcept { __v2di v2 = _mm256_castsi256_si128(v4) + _mm256_extracti128_si256(v4, 1); return v2 + _mm_shuffle_epi32(v2, 0b1110); } ``` When compiled with `gcc-14.2 -std=c++20 -pthread -Wall -Wextra -Werror -march=znver3 -O2 -mtune=znver3` produce code with different number of instructions: ``` hsum2a(long long vector[4]): vextracti128 xmm1, ymm0, 0x1 vpaddq xmm0, xmm1, xmm0 vpshufd xmm1, xmm0, 14 vpaddq xmm0, xmm0, xmm1 ret hsum2b(long long vector[4]): vmovdqa xmm1, xmm0 vextracti128 xmm0, ymm0, 0x1 vpaddq xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 14 vpaddq xmm0, xmm0, xmm1 ret ``` That extra `vmovdqa` instruction in `hsum2b` is unnecessary and shouldn't be there. Why does gcc emit it, please? `clang` produces identical code with no superfluous instructions for both functions: https://godbolt.org/z/dzd69oz4W