Issue 172097
Summary [missed-opt] [selectiondag] Bad codegen on shift of a 128-bit number by an assumed-small amount
Labels new issue
Assignees
Reporter purplesyringa
    [Godbolt](https://godbolt.org/z/arEhKn1P1)

This is minimized from real code, so it looks odd. The `assume` is intended to show that the shift cannot overflow, and the high half of `s` can be computed with something like `shld` on x86-64.

In reality, if `assume(k < 64)` is added, the generated code still checks for `k & 64` (`with_assume`). Using `k & 63` as the index generates good code and lets LLVM realize this is a rotate (`with_masking`). Interestingly, mixing `k & 63` and `assume` still produces bad code (`with_assume_and_masking`).

```cpp
#include <stdint.h>

uint64_t with_assume(uint64_t x, uint64_t k) {
 __builtin_assume(k < 64);
    __uint128_t s = (__uint128_t)x << k;
 return (uint64_t)s + (uint64_t)(s >> 64);
}

uint64_t with_assume_and_masking(uint64_t x, uint64_t k) {
    __builtin_assume(k < 64);
    __uint128_t s = (__uint128_t)x << (k & 63);
    return (uint64_t)s + (uint64_t)(s >> 64);
}

uint64_t with_masking(uint64_t x, uint64_t k) {
    __uint128_t s = (__uint128_t)x << (k & 63);
    return (uint64_t)s + (uint64_t)(s >> 64);
}
```

```asm
with_assume(unsigned long, unsigned long):
        mov     rcx, rsi
        xor     edx, edx
        shld rdx, rdi, cl
        shl     rdi, cl
        xor     eax, eax
        test cl, 64
        cmovne  rdx, rdi
        cmove   rax, rdi
        add rax, rdx
        ret

with_assume_and_masking(unsigned long, unsigned long):
        mov     rcx, rsi
        xor     edx, edx
        shld rdx, rdi, cl
        shl     rdi, cl
        xor     eax, eax
        test cl, 64
        cmovne  rdx, rdi
        cmove   rax, rdi
        add rax, rdx
        ret

with_masking(unsigned long, unsigned long):
 mov     rcx, rsi
        mov     rax, rdi
        rol     rax, cl
 ret
```

As far as I can tell, the reason for bad codegen in `with_assume_and_masking` is that `assume` causes `& 63` to be optimized out during constprop before isel, and then the `assume` is removed during codegenprepare and backend has no idea the shift amount is bounded. For `with_assume` alone, the removal of `assume` similarly means that backend is not aware of the bounds.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to