Issue 172193
Summary [missed-opt] Naive per-bit loop not optimized to popcnt
Labels new issue
Assignees
Reporter purplesyringa
    [Godbolt](https://godbolt.org/z/P5roxr96j)

In this snippet, `popcount_naive` is lowered as an unrolled loop, while `popcount_bit_tricks` is optimized to a `popcnt` instruction. This seems kind of backwards -- I would've expected a simpler loop to get recognized as an idiom than the more complex one.

```cpp
int popcount_naive(unsigned long x) {
    int result = 0;
    for (int i = 0; i < 64; i++) {
        result += (x >> i) & 1;
    }
    return result;
}

int popcount_bit_tricks(unsigned long x) {
    int result = 0;
    while (x) {
        result++;
        x &= x - 1;
    }
    return result;
}
```

Though I imagine improving this would be non-trivial, since you'd have to recognize this either before the loop is unrolled or somehow find all the accesses later...

Bonus:

```cpp
int popcount_divide_and_conquer(unsigned long x) {
    x = (x & 0x5555555555555555) + ((x >> 1 ) & 0x5555555555555555);
    x = (x & 0x3333333333333333) + ((x >> 2 ) & 0x3333333333333333);
    x = (x & 0x0f0f0f0f0f0f0f0f) + ((x >> 4 ) & 0x0f0f0f0f0f0f0f0f);
    x = (x & 0x00ff00ff00ff00ff) + ((x >> 8 ) & 0x00ff00ff00ff00ff);
    x = (x & 0x0000ffff0000ffff) + ((x >> 16) & 0x0000ffff0000ffff);
    x = (x & 0x00000000ffffffff) + ((x >> 32) & 0x00000000ffffffff);
    return x;
}
```

This is another popcount idiom, and it's always more performant to optimize it to `popcnt` if it's available.

It also seems reasonable to optimize `popcount_naive` to `popcount_divide_and_conquer` on targets without `popcnt`. (Optimizing `popcount_bit_tricks` to `popcount_divide_and_conquer` is more iffy, since the latter has worse performance on numbers with low popcount.)
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to