Issue 137836
Summary Performance regression / bad code generation on PPC603e CPU since clang/llvm v15
Labels clang
Assignees
Reporter andyg1001
    Compiled with the options "-target powerpc-unknown-linux-gnu -mcpu=603e -O2" the following code should produce almost identical outputs:

```
static char buffer[16];

int test1(int offset1, int offset2, int value)
  {
 *(int*)(buffer + offset1) = value;
  return *(int*)(buffer + offset2);
 }

int test2(char* buffer, int offset1, int offset2, int value)
  {
 *(int*)(buffer + offset1) = value;
  return *(int*)(buffer + offset2);
 }
```

On clang/llvm up to and including version 14, this is the correct (tested and functional) output (https://godbolt.org/z/qr9dn5che):

```
test1(int, int, int):
        lis 6, _ZL6buffer@ha
        la 6, _ZL6buffer@l(6)
        stwx 5, 6, 3
 lwzx 3, 6, 4
        blr

test2(char*, int, int, int):
        stwx 6, 3, 4
        lwzx 3, 3, 5
        blr
```

But since clang/llvm version 15 up to trunk version, this is the output (https://godbolt.org/z/qfzccdMqc):

```
.L0$poff:
        .long .LTOC-.L0$pb
test1(int, int, int):
        mflr 0
        stw 0, 4(1)
 stwu 1, -16(1)
        stw 30, 8(1)
        bl .L0$pb
.L0$pb:
 mflr 30
        lwz 6, .L0$poff-.L0$pb(30)
        add 30, 6, 30
 lwz 6, .LC0-.LTOC(30)
        stwx 5, 6, 3
        lwzx 3, 6, 4
 lwz 0, 20(1)
        lwz 30, 8(1)
        addi 1, 1, 16
        mtlr 0
 blr

test2(char*, int, int, int):
        stwx 6, 3, 4
        lwzx 3, 3, 5
        blr

.LC0:
        .long   _ZL6buffer
```

This is just test code to demonstrate the issue, but in actual production code which is too complex to post here, this explosion of code in the 'test1' case which uses a global buffer rather than a passed-in pointer is causing a significant performance regression, preventing the adoption of newer clang compilers.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to