http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-08-24 09:45:56 UTC --- What we IMHO should optimize and don't currently is the redundant sign extension when using __builtin_ffsl - as it internally uses bsf + cmove, nonzero_bits isn't able to figure out that the result of the sequence is guaranteed to have nonzero-bits. Perhaps we should in that case add a REG_EQUAL note to the last insn in the sequence and perhaps nonzero_bits could also look at REG_EQUAL notes. Doing that could perhaps help even testcases like: int foo (long x) { return __builtin_popcountl (x) & 0xff; } where the andl $0x255, %eax could be optimized away, etc.