https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984

            Bug ID: 118984
           Summary: Unnecessary instructions are emitted when addition
                    terms are in an unfortunate order
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: maxim.yegorushkin at gmail dot com
  Target Milestone: ---

Unnecessary instructions are emitted when addition terms are in an unfortunate
order.

The following two functions:

```
#include <immintrin.h>

__v2di hsum2a(__v4di v4) noexcept {
    __v2di v2 = _mm256_extracti128_si256(v4, 1) + _mm256_castsi256_si128(v4);
    return v2 + _mm_shuffle_epi32(v2, 0b1110);
}

__v2di hsum2b(__v4di v4) noexcept {
    __v2di v2 = _mm256_castsi256_si128(v4) + _mm256_extracti128_si256(v4, 1);
    return v2 + _mm_shuffle_epi32(v2, 0b1110);
}
```

When compiled with `gcc-14.2 -std=c++20 -pthread -Wall -Wextra -Werror
-march=znver3 -O2 -mtune=znver3` produce code with different number of
instructions:

```
hsum2a(long long vector[4]):
        vextracti128    xmm1, ymm0, 0x1
        vpaddq  xmm0, xmm1, xmm0
        vpshufd xmm1, xmm0, 14
        vpaddq  xmm0, xmm0, xmm1
        ret
hsum2b(long long vector[4]):
        vmovdqa xmm1, xmm0
        vextracti128    xmm0, ymm0, 0x1
        vpaddq  xmm0, xmm0, xmm1
        vpshufd xmm1, xmm0, 14
        vpaddq  xmm0, xmm0, xmm1
        ret
```

That extra `vmovdqa` instruction in `hsum2b` is unnecessary and shouldn't be
there. Why does gcc emit it, please?

`clang` produces identical code with no superfluous instructions for both
functions: https://godbolt.org/z/dzd69oz4W

Reply via email to