Issue 142042
Summary Performance: LLVM 20 aggressively optimizes popcnt and results in worse performance
Labels new issue
Assignees
Reporter jabraham17
    I am finding that the same microbenchmark, when compiled with clang 20, is much slower then the same benchmark compiled with clang 19.

[This link](https://godbolt.org/z/5Md1dG9YG) has the full benchmark and the assembly for clang 19 and 20.

The LLVM 20 code is much longer, and seems to be because the LLVM 20 version is vectorized and not using the `popcnt` instruction. For some reason, this is slower. The LLVM 20 version takes .15s, the LLVM 19 version takes .05s.

Using the naive popcnt for the C version does seem to get pattern matched better and result in LLVM 20 being just as fast, if not faster

```
   uint64_t c = 0;
   while (n) {
        n &= (n - 1);
        c++;
    }
    return c;
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to