Issue 128441
Summary Suboptimal codegen involving tzcnt
Labels new issue
Assignees
Reporter tavianator
    Here is a simple loop to find the index of the first different bit in two arrays:

```c
unsigned long crit_bit(unsigned long *a, unsigned long *b, unsigned long size) {
    unsigned long ret = 0;
#pragma nounroll
    for (unsigned long i = 0; i < size; ++i) {
 unsigned long diff = a[i] ^ b[i];
        int bits = 8 * sizeof(unsigned long);
        ret += __builtin_ctzg(diff, bits);
        if (diff) {
 break;
        }
    }
    return ret;
}
```

On x86-64 with `-O3 -mbmi`, the loop generates

```asm
.LBB0_4:
        mov     r9, rax
        mov     r10, qword ptr [rdi + 8*r8]
        mov     r11, qword ptr [rsi + 8*r8]
        mov     rax, r11
        xor     rax, r10
 tzcnt   rax, rax
        mov     rbx, r11
        xor     rbx, r10
 cmove   rax, rcx
        add     rax, r9
        xor     r11, r10
 jne     .LBB0_6
        lea     r9, [r8 + 1]
        cmp     rdx, r8
 mov     r8, r9
        jne     .LBB0_4
```

First, it's using `cmove` to get 64 in case the input was zero, but `tzcnt` already returns 64 in that case so that's redundant.

Second, it could use the flags after `tzcnt` to handle the `if (diff) break`, but instead it re-does the comparison.

Better code would look something like this:

```asm
 xor     rax, r10
        tzcnt   rax, rax
        lea     rax, [rax + r9]
 jc      .LBB0_6
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to