Issue 124993
Summary Suboptimal code generation for vectorized version of llvm.ctlz() for int64 on x86-64
Labels new issue
Assignees
Reporter aneshlya
    LLVM generates suboptimal code for `llvm.ctlz()` on the int64 type across various x86-64 instruction sets (SSE4–AVX2) before AVX512. Performance measurements indicate that extracting individual 64-bit values from the `ymm` register and applying `lzcnt` separately to each yields a 25% improvement on AVX2 and a 124% improvement on SSE4, compared to `llvm.ctlz` vectorized implementation.

Please see the example here: https://ispc.godbolt.org/z/EEErrednx
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to