| Issue |
160714
|
| Summary |
Suboptimal/inconsistent codegen — clang doesn't realize an instruction save unless a while loop is manually transformed
|
| Labels |
clang
|
| Assignees |
|
| Reporter |
00001H
|
Given the following simple C++ function, which counts the number of holes in the numbers in the string: (0,4,6,9 have 1 hole, 8 has two holes) (the same code as #160710):
```cpp
#include<cstdint>
#include<array>
using ui = std::uint_fast32_t;
using uc = std::uint_fast8_t;
ui countholes(const char* s){
constexpr static std::array<ui,10> pre_table{1,0,0,0,1,0,1,0,2,1};
uc c;
ui tot = 0;
while(true){
c = uc(*s++);
if(c<uc('0'))break;
tot += pre_table[c-uc('0')];
}
return tot;
}
```
clang [generates](https://godbolt.org/z/zxdezbE5K):
```asm
countholes:
movzx ecx, byte ptr [rdi]
cmp cl, 48
jae .LBB0_3
xor eax, eax
ret
.LBB0_3:
inc rdi
xor eax, eax
lea rdx, [rip + countholes::pre_table]
.LBB0_4:
movzx ecx, cl
add ecx, -48
add rax, qword ptr [rdx + 8*rcx]
movzx ecx, byte ptr [rdi]
inc rdi
cmp cl, 47
ja .LBB0_4
ret
countholes::pre_table:
.quad 1
.quad 0
.quad 0
.quad 0
.quad 1
.quad 0
.quad 1
.quad 0
.quad 2
.quad 1
```
We see that `add ecx, -48` subtracts 48 from `ecx` before the next instruction uses it in an effective address. The value in `ecx` is then immediately overwritten by a subsequent `movzx` instruction, meaning the only use of `ecx` is in the effective address. Therefore the `add` instruction can be elided in favor of adding a constant `-384` displacement onto the effective address.
If we [transform the while loop into an `if` and a `do`-`while` loop](https://godbolt.org/z/Ys4nTbe9q), as is what clang seems to do anyway:
```cpp
#include<cstdint>
#include<array>
using ui = std::uint_fast32_t;
using uc = std::uint_fast8_t;
extern "C" ui countholes(const char* s){
constexpr static std::array<ui,10> pre_table{1,0,0,0,1,0,1,0,2,1};
uc c;
ui tot = 0;
c = uc(*s++);
if(c<uc('0')) return 0;
do{
tot += pre_table[c-uc('0')];
c = uc(*s++);
}while(c>=uc('0'));
return tot;
}
```
The assembly output is:
```asm
; SNIP
.LBB0_4:
movzx ecx, cl
add rax, qword ptr [rdx + 8*rcx - 384]
movzx ecx, byte ptr [rdi]
inc rdi
cmp cl, 47
ja .LBB0_4
ret
```
Notice how the `-384` displacement is now added onto the effective address, and the add to `ecx` is gone. For some reason, the compiler doesn't realize the optimization unless the loop is inverted.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs