[llvm-bugs] [Bug 131588] x86 avx2 vpor is first done on calculation-heavy operands

LLVM Bugs via llvm-bugs Mon, 17 Mar 2025 02:34:37 -0700

Issue	131588
Summary	x86 avx2 vpor is first done on calculation-heavy operands
Labels	new issue
Assignees
Reporter	ImpleLee

    See the code and the compilation result at https://godbolt.org/z/Kchh341vW . This code calculates vpor of several operands in the loop, where some operands are relatively cheap to calculate, while some are not. Compilation flags: `-O3 -std=c++2b -march=skylake`.


```c++
#include <experimental/simd>
#include <cstdint>
namespace stdx = std::experimental;

template <class T, std::size_t N>
using simd_of = stdx::simd<T, stdx::simd_abi::deduce_t<T, N>>;

using data_t = simd_of<std::uint64_t, 4>;

data_t f(data_t a, data_t b) {
    while (true) {
        data_t result = a;
        result |= (a << 1) & std::uint64_t(0x802008020080200);
        result |= a >> 1;
 result |= a >> 10;
        data_t temp = a << 50;
        result |= data_t([=](auto i) {
            if constexpr (i + 1 >= 4) return 0;
 else return temp[i + 1];
        });
        result &= b;
        if (all_of((result & ~a) == 0)) return a;
        a = result;
 }
}
```

The assembly of the loop is as follows.
```asm
.LBB0_1:
 vmovdqa %ymm4, %ymm3
        vpaddq  %ymm4, %ymm4, %ymm4
        vpand %ymm1, %ymm4, %ymm4
        vpsrlq  $1, %ymm3, %ymm5
        vpsrlq  $10, %ymm3, %ymm6
        vpor    %ymm6, %ymm5, %ymm5
        vpsllq  $50, %ymm3, %ymm6
        vpermq  $249, %ymm6, %ymm6 # latency 3 on skylake
 vpblendd        $192, %ymm2, %ymm6, %ymm6
        vpor    %ymm6, %ymm5, %ymm5 # ymm6 is heavy to calculate, but or'ed first
        vpor    %ymm3, %ymm5, %ymm5 # ymm3 and ymm4 are cheap to calculate, but or'ed later
 vpor    %ymm4, %ymm5, %ymm4
        vpand   %ymm0, %ymm4, %ymm4
 vptest  %ymm4, %ymm3
        jae     .LBB0_1
```

The critical path of this loop is `vpmov-> vpsll $50 -> vperm -> vpblend -> vpor -> vpor -> vpor -> vpand`, but if ymm6 is vpor'ed later, the other two vpor's does not need to be on the critical path.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 131588] x86 avx2 vpor is first done on calculation-heavy operands

Reply via email to