https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101973

            Bug ID: 101973
           Summary: subtraction of clz is not optimized
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sven.koehler at gmail dot com
  Target Milestone: ---

On Intel x86_64, the generated code for __builtin_clz(x) looks something like
this: clz(x) = 63 - bsr(x)

Since Intel does not seem to have a way to do 63-y in a single instruction, XOR
is used instead and the actual assembly code corresponds to clz(x) = 63 ^
bsr(x). Since bsr(x) is in the range 0 to 63, the XOR with 63 is equivalent to
63-y.

However, when we actually need the index of the most significant non-zero bit,
we have another 63-y, as in this function: 

int bsr(unsigned long x) {
    return sizeof(x)*8 - 1 - __builtin_clzl(x);
}


With -O3, GCC emits the following assembly code:

bsr:
        bsr     rdi, rdi
        mov     eax, 63
        xor     rdi, 63
        sub     eax, edi
        ret


The XOR with 63 and the subtraction from 63 cancel each other out in this
special case. LLVM/clang performs this optimization.

One might also consider the arbitrary case of z-clz(x) as a test case. On
Intel, this is equivalent to bsr(x)+(z-63).

Reply via email to