At Thu, 14 Feb 2019 16:45:38 -0500, Tom Lane <t...@sss.pgh.pa.us> wrote in 
<822.1550180...@sss.pgh.pa.us>
> Andres Freund <and...@anarazel.de> writes:
> > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
> >> Hah, I just realized you have to add -mlzcnt in order for these builtins
> >> to use the lzcnt instructions.  It goes from something like
> >> 
> >> bsrq       %rax, %rax
> >> xorq       $63, %rax
> 
> > I'm confused how this is a general count leading zero operation? Did you
> > use constants or something that allowed ot infer a range in the test? If
> > so the compiler probably did some optimizations allowing it to do the
> > above.
> 
> No.  If you compile
> 
> int myclz(unsigned long long x)
> {
>   return __builtin_clzll(x);
> }
> 
> at -O2, on just about any x86_64 gcc, you will get
> 
> myclz:
> .LFB1:
>         .cfi_startproc
>         bsrq    %rdi, %rax
>         xorq    $63, %rax
>         ret
>         .cfi_endproc
> 

I understand that the behavior of __builtin_c[tl]z(0) is
undefined from the reason, they convert to bs[rf]. So if we use
these builtins, additional check is required.

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

> Built-in Function: int __builtin_clz (unsigned int x)
>   Returns the number of leading 0-bits in x, starting at the most
>   significant bit position. If x is 0, the result is undefined.
> 
> Built-in Function: int __builtin_ctz (unsigned int x)
>   Returns the number of trailing 0-bits in x, starting at the
>   least significant bit position. If x is 0, the result is
>   undefined.


And even worse lzcntx is accidentially "fallback"s to bsrx on
unsupported CPUs, which leads to bogus results.
__builtin_clzll(8) = 3, which should be 60.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to