https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113764

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[X86] Generates lzcnt when  |[X86] __builtin_clz
                   |bsr is sufficient           |generates lzcnt when bsr is
                   |                            |sufficient

--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
Yep, CLZ_DEFINED_VALUE_AT_ZERO really complicates things.  With a single
"global" macro it's currently impossible for a backend to support two different
CLZ instructions; one with defined behavior at zero, and the other with
undefined behavior at zero.

It might just be possible to do something encoding LZCNT patterns in RTL using:
(if_then_else:SI (ne:SI (reg:SI x) (const_int 0))
                 (clz:SI (reg:SI x))
                 (const_int VALUE))

Additionally on x86_64, the BSR instruction sets the zero flag if it's input is
zero, when the destination register becomes undefined, which can be useful with
CMOV, i.e. it's possible to get defined behavior without an additional test and
branch.  But for Pawel's original tescase, __builtin_clz is undefined at zero,
so this really is a missed optimization, with either -Os or a modern -march
such as cascadelake or znver4.

I agree with Jakub, this is a can of worms; potentially a lot of effort for a
marginal improvement.

Reply via email to