Issue 160714
Summary Suboptimal/inconsistent codegen — clang doesn't realize an instruction save unless a while loop is manually transformed
Labels clang
Assignees
Reporter 00001H
    Given the following simple C++ function, which counts the number of holes in the numbers in the string: (0,4,6,9 have 1 hole, 8 has two holes) (the same code as #160710):
```cpp
#include<cstdint>
#include<array>
using ui = std::uint_fast32_t;
using uc = std::uint_fast8_t;
ui countholes(const char* s){
    constexpr static std::array<ui,10> pre_table{1,0,0,0,1,0,1,0,2,1};
    uc c;
    ui tot = 0;
    while(true){
        c = uc(*s++);
        if(c<uc('0'))break;
        tot += pre_table[c-uc('0')];
    }
    return tot;
}
```
clang [generates](https://godbolt.org/z/zxdezbE5K):
```asm
countholes:
        movzx   ecx, byte ptr [rdi]
        cmp     cl, 48
        jae     .LBB0_3
        xor     eax, eax
        ret
.LBB0_3:
        inc     rdi
        xor     eax, eax
        lea     rdx, [rip + countholes::pre_table]
.LBB0_4:
        movzx   ecx, cl
        add     ecx, -48
        add     rax, qword ptr [rdx + 8*rcx]
        movzx   ecx, byte ptr [rdi]
        inc     rdi
        cmp     cl, 47
        ja      .LBB0_4
        ret

countholes::pre_table:
        .quad   1
        .quad   0
        .quad   0
        .quad   0
        .quad   1
        .quad   0
        .quad   1
        .quad   0
        .quad   2
        .quad   1
```
We see that `add ecx, -48` subtracts 48 from `ecx` before the next instruction uses it in an effective address. The value in `ecx` is then immediately overwritten by a subsequent `movzx` instruction, meaning the only use of `ecx` is in the effective address. Therefore the `add` instruction can be elided in favor of adding a constant `-384` displacement onto the effective address.
If we [transform the while loop into an `if` and a `do`-`while` loop](https://godbolt.org/z/Ys4nTbe9q), as is what clang seems to do anyway:
```cpp
#include<cstdint>
#include<array>
using ui = std::uint_fast32_t;
using uc = std::uint_fast8_t;
extern "C" ui countholes(const char* s){
    constexpr static std::array<ui,10> pre_table{1,0,0,0,1,0,1,0,2,1};
    uc c;
    ui tot = 0;
    c = uc(*s++);
    if(c<uc('0')) return 0;
    do{
        tot += pre_table[c-uc('0')];
        c = uc(*s++);
    }while(c>=uc('0'));
    return tot;
}
```
The assembly output is:
```asm
; SNIP
.LBB0_4:
        movzx   ecx, cl
        add     rax, qword ptr [rdx + 8*rcx - 384]
        movzx   ecx, byte ptr [rdi]
        inc     rdi
        cmp     cl, 47
        ja      .LBB0_4
        ret
```
Notice how the `-384` displacement is now added onto the effective address, and the add to `ecx` is gone. For some reason, the compiler doesn't realize the optimization unless the loop is inverted.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to