Issue |
136838
|
Summary |
Auto-vectorizer generates different code for the main loop and for the trailing section
|
Labels |
new issue
|
Assignees |
|
Reporter |
dmenendez-gruposantander
|
Repro and code available in godbolt: https://godbolt.org/z/c936s31hn
Function `g()` in the example is expected to be auto-vectorized, and produce the same results as function f`()`, which is not vectorized. This is true for the code generated for the "main" part of the loop: I can see it in the generated assembly and in the test results.
However, the code generated for the **trailing** part of the loop does not match the code generated for the "main" part of the loop: the trailing section fails to fuse a multiplication and an addition.
The example program shows this effect:
- it compares output of `f()` and `g()` for 1 element
- it compares the output of `g()` for 5 equal input elements, showing that the 5-th result (coming out of the trailing part of the loop) is different than the rest.
For reference, the godbolt link compares against gcc, which generates the expected result.
```
$ clang --version
Ubuntu clang version 19.1.7 (++20250114103320+cd708029e0b2-1~exp1~20250114103432.75)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin
```
Source of installation: https://apt.llvm.org/jammy/dists/llvm-toolchain-jammy-19/main/
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs