https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #3 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Gabriel Ravier from comment #1) > Inversely, I'd also suggest doing the opposite. That is, if there is no > hardware popcount instruction, `__builtin_popcount(v) == 1` should be > optimized to `v && !(v & (v - 1))` I actually posted a patch for this and popcount(x) > 1 given the reverse transformation is faster on all targets - even if they have popcount instruction (since they are typically more expensive). This is true on x86 as well, (x-1) <u (x & -x) is never slower than using popcount. So I suggest not to have LLVM emit popcount for this is there a popcount instruction since that is non-optimal for pretty much every target. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693