At Thu, 14 Feb 2019 16:45:38 -0500, Tom Lane <t...@sss.pgh.pa.us> wrote in <822.1550180...@sss.pgh.pa.us> > Andres Freund <and...@anarazel.de> writes: > > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote: > >> Hah, I just realized you have to add -mlzcnt in order for these builtins > >> to use the lzcnt instructions. It goes from something like > >> > >> bsrq %rax, %rax > >> xorq $63, %rax > > > I'm confused how this is a general count leading zero operation? Did you > > use constants or something that allowed ot infer a range in the test? If > > so the compiler probably did some optimizations allowing it to do the > > above. > > No. If you compile > > int myclz(unsigned long long x) > { > return __builtin_clzll(x); > } > > at -O2, on just about any x86_64 gcc, you will get > > myclz: > .LFB1: > .cfi_startproc > bsrq %rdi, %rax > xorq $63, %rax > ret > .cfi_endproc >
I understand that the behavior of __builtin_c[tl]z(0) is undefined from the reason, they convert to bs[rf]. So if we use these builtins, additional check is required. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html > Built-in Function: int __builtin_clz (unsigned int x) > Returns the number of leading 0-bits in x, starting at the most > significant bit position. If x is 0, the result is undefined. > > Built-in Function: int __builtin_ctz (unsigned int x) > Returns the number of trailing 0-bits in x, starting at the > least significant bit position. If x is 0, the result is > undefined. And even worse lzcntx is accidentially "fallback"s to bsrx on unsupported CPUs, which leads to bogus results. __builtin_clzll(8) = 3, which should be 60. regards. -- Kyotaro Horiguchi NTT Open Source Software Center