Issue |
124993
|
Summary |
Suboptimal code generation for vectorized version of llvm.ctlz() for int64 on x86-64
|
Labels |
new issue
|
Assignees |
|
Reporter |
aneshlya
|
LLVM generates suboptimal code for `llvm.ctlz()` on the int64 type across various x86-64 instruction sets (SSE4–AVX2) before AVX512. Performance measurements indicate that extracting individual 64-bit values from the `ymm` register and applying `lzcnt` separately to each yields a 25% improvement on AVX2 and a 124% improvement on SSE4, compared to `llvm.ctlz` vectorized implementation.
Please see the example here: https://ispc.godbolt.org/z/EEErrednx
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs