| Issue |
128722
|
| Summary |
Suboptimal register use on x86-64
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
tavianator
|
Saw this while looking at https://github.com/llvm/llvm-project/issues/128441, but this seems like a more general issue. This C code
```c
int foo(int *a, int *b, int size) {
int ret = 0;
for (int i = 0; i < size; ++i) {
int diff = a[i] ^ b[i];
ret += diff;
if (!diff) {
break;
}
}
return ret;
}
```
[compiles](https://godbolt.org/z/dK74hY1r4) at all optimization levels to:
```asm
foo:
test edx, edx
jle .LBB0_1
mov ecx, edx
dec rcx
xor edx, edx
xor eax, eax
.LBB0_3:
mov r8d, eax
mov r9d, dword ptr [rdi + 4*rdx]
mov r10d, dword ptr [rsi + 4*rdx]
mov eax, r10d
xor eax, r9d
add eax, r8d
xor r10d, r9d
je .LBB0_5
lea r8, [rdx + 1]
cmp rcx, rdx
mov rdx, r8
jne .LBB0_3
.LBB0_5:
ret
.LBB0_1:
xor eax, eax
ret
```
It keeps moving the accumulator from `eax` to `r8d` and back. It also materializes the `xor` result twice when it doesn't need to. Ideally I think it would look more like this:
```asm
.LBB0_3:
mov r8d, dword ptr [rdi + 4*rdx]
xor r8d, dword ptr [rsi + 4*rdx]
je .LBB0_5
add eax, r8d
inc edx
cmp ecx, edx
jne .LBB0_3
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs