https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #2 from Alexander Monakov <amonakov at gcc dot gnu.org> --- x86 has native popcount only with -msse4.2, otherwise popcount(int) first zero-extends to 64-bit, then calls __popcountdi2 (64-bit libgcc popcount). If the original code computes popcount on narrow types, or has only a few non-zero bits, it can be expected that libgcc replacement is slower. Even if size-wise popcount detection is an optimization, speed-wise GCC probably should avoid replacing a simple loop with a libgcc call (just like final value replacement avoids replacing a loop with computations involving modulus/division).