Issue |
137836
|
Summary |
Performance regression / bad code generation on PPC603e CPU since clang/llvm v15
|
Labels |
clang
|
Assignees |
|
Reporter |
andyg1001
|
Compiled with the options "-target powerpc-unknown-linux-gnu -mcpu=603e -O2" the following code should produce almost identical outputs:
```
static char buffer[16];
int test1(int offset1, int offset2, int value)
{
*(int*)(buffer + offset1) = value;
return *(int*)(buffer + offset2);
}
int test2(char* buffer, int offset1, int offset2, int value)
{
*(int*)(buffer + offset1) = value;
return *(int*)(buffer + offset2);
}
```
On clang/llvm up to and including version 14, this is the correct (tested and functional) output (https://godbolt.org/z/qr9dn5che):
```
test1(int, int, int):
lis 6, _ZL6buffer@ha
la 6, _ZL6buffer@l(6)
stwx 5, 6, 3
lwzx 3, 6, 4
blr
test2(char*, int, int, int):
stwx 6, 3, 4
lwzx 3, 3, 5
blr
```
But since clang/llvm version 15 up to trunk version, this is the output (https://godbolt.org/z/qfzccdMqc):
```
.L0$poff:
.long .LTOC-.L0$pb
test1(int, int, int):
mflr 0
stw 0, 4(1)
stwu 1, -16(1)
stw 30, 8(1)
bl .L0$pb
.L0$pb:
mflr 30
lwz 6, .L0$poff-.L0$pb(30)
add 30, 6, 30
lwz 6, .LC0-.LTOC(30)
stwx 5, 6, 3
lwzx 3, 6, 4
lwz 0, 20(1)
lwz 30, 8(1)
addi 1, 1, 16
mtlr 0
blr
test2(char*, int, int, int):
stwx 6, 3, 4
lwzx 3, 3, 5
blr
.LC0:
.long _ZL6buffer
```
This is just test code to demonstrate the issue, but in actual production code which is too complex to post here, this explosion of code in the 'test1' case which uses a global buffer rather than a passed-in pointer is causing a significant performance regression, preventing the adoption of newer clang compilers.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs