https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113764
Roger Sayle <roger at nextmovesoftware dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[X86] Generates lzcnt when |[X86] __builtin_clz |bsr is sufficient |generates lzcnt when bsr is | |sufficient --- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> --- Yep, CLZ_DEFINED_VALUE_AT_ZERO really complicates things. With a single "global" macro it's currently impossible for a backend to support two different CLZ instructions; one with defined behavior at zero, and the other with undefined behavior at zero. It might just be possible to do something encoding LZCNT patterns in RTL using: (if_then_else:SI (ne:SI (reg:SI x) (const_int 0)) (clz:SI (reg:SI x)) (const_int VALUE)) Additionally on x86_64, the BSR instruction sets the zero flag if it's input is zero, when the destination register becomes undefined, which can be useful with CMOV, i.e. it's possible to get defined behavior without an additional test and branch. But for Pawel's original tescase, __builtin_clz is undefined at zero, so this really is a missed optimization, with either -Os or a modern -march such as cascadelake or znver4. I agree with Jakub, this is a can of worms; potentially a lot of effort for a marginal improvement.