Issue 159552
Summary (ARM & AArch64) Consider special casing 8-bit popcount
Labels new issue
Assignees
Reporter Explorer09
    This is more of a feature request than a bug.

This is a follow-up of #158741, about an 8-bit popcount operation.

```c
#include <stdint.h>
#if defined(__ARM_NEON)
#include <arm_neon.h>
unsigned int popcount_8(uint8_t x) {
    // Initialize the vector register. Set all lanes at once so that the
    // compiler will not emit instruction to zero-initialize other lanes.
    uint8x8_t v = vdup_n_u8(x);
    // Count the number of set bits for each lane (8-bit) in the vector.
    v = vcnt_u8(v);
    // Get lane 0 and discard lanes 1 to 7. (Return type was uint8_t)
    return vget_lane_u8(v, 0);
}
#endif

unsigned int popcount_8_b(uint8_t x) {
    return (unsigned int)__builtin_popcount(x);
}
```

With Aarch64 target, you can see the compiled code difference:
(clang trunk-20250918)
```assembly
popcount_8:
        fmov    s0, w0
 cnt     v0.8b, v0.8b
        umov    w0, v0.b[0]
 ret
popcount_8_b:
        and     w8, w0, #0xff
        fmov    s0, w8
 cnt     v0.8b, v0.8b
        fmov    w0, s0
        ret
```

(gcc 15.2.0)
```assembly
popcount_8:
        dup     v31.8b, w0
        cnt v31.8b, v31.8b
        umov    w0, v31.b[0]
        ret
```

Comparing to the solution currently done in #158741, I can save even a bitwise AND operation. I think this is the smallest code possible for 8-bit popcount in AArch64 and ARMv7+NEON.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to