[llvm-bugs] [Bug 172217] Loop vectorizer generates inefficient code

LLVM Bugs via llvm-bugs Sun, 14 Dec 2025 08:02:43 -0800

Issue	172217
Summary	Loop vectorizer generates inefficient code
Labels	new issue
Assignees
Reporter	zijinshanren

    https://godbolt.org/z/MPPnvT5h8

the simple code:
```cpp
void swap_ptr_impl(int64_t* ptr, size_t len) {
    for (size_t i = 0; i < len; i++) {
        ptr[i] = std::byteswap(ptr[i]);
 }
}
void swap_ptr2_impl(int64_t* ptr, size_t len) {
    auto end = ptr + len;
    for (; ptr < end; ptr++) {
        *ptr = std::byteswap(*ptr);
 }
}


void swap_span_impl(std::span<int64_t> sp) {
    for (auto& x : sp) {
        x = std::byteswap(x);
    }
}

void swap_span_2(std::span<int64_t, 1024> sp) {
    for (auto& x : sp) {
 x = std::byteswap(x);
    }
}
```

swap_ptr_impl is 2x slower than other functions on i9-14900KF. 2.8x slower is seen on quickbench.
swap_span_2 (span length known) is also 2x slower. 
```text
Run on (32 X 3187 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 2048 KiB (x16)
  L3 Unified 36864 KiB (x1)
------------------------------------------------------
Benchmark Time             CPU Iterations
------------------------------------------------------
swap_ptr 400 ns          390 ns      1723077
swap_ptr2          184 ns 180 ns      4072727
swap_span          176 ns          165 ns 4072727
swap_span_2        403 ns          399 ns      1723077
```

with -fno-vectorize, the results are reasonable.
```text
Run on (32 X 3187 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
 L2 Unified 2048 KiB (x16)
  L3 Unified 36864 KiB (x1)
------------------------------------------------------
Benchmark Time             CPU Iterations
------------------------------------------------------
swap_ptr 181 ns          184 ns      4072727
swap_ptr2          181 ns 180 ns      3733333
swap_span          173 ns          172 ns 3733333
swap_span_2        175 ns          173 ns      4072727
```

so I assume that there is something wrong in the loop vectorizer. Verified since clang 17.

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 172217] Loop vectorizer generates inefficient code

Reply via email to