Issue |
120339
|
Summary |
Redundant Copying of Large Struct Parameter to Stack When Passed to Another Function
|
Labels |
new issue
|
Assignees |
|
Reporter |
jonathan-gruber-jg
|
When passing a large struct as an argument to a function, and the calling function already has the large struct as a parameter, Clang redundantly copies the struct parameter to the stack.
A minimal test case is in the attached file test.c.txt (GitHub would not allow me to upload it with the .c extension, sadly), reproduced below for your convenience:
```
struct S {
void *x, *y, *z, *w;
};
extern int extern_func(struct S);
int tail_call(struct S x) {
return extern_func(x);
}
int non_tail_call(struct S x) {
return ~extern_func(x);
}
```
I only tested the target architectures x86_64, aarch64, and riscv64, but I would not be surprised if other target architectures exhibit the same inefficiency.
Host system: Arch Linux, x86_64.
Clang version: official Arch Linux package of clang, version 18.1.8-4.
Command line to reproduce results: clang -c test.c --target=<arch> -O<opt-level>
x86_64 assembly (Intel syntax), with -Oz, -Os, -O2, or -O3
```
tail_call:
push rbp
mov rbp,rsp
pop rbp
jmp extern_func
non_tail_call:
push rbp
mov rbp,rsp
sub rsp,0x20
movaps xmm0,XMMWORD PTR [rbp+0x10]
movaps xmm1,XMMWORD PTR [rbp+0x20]
movups XMMWORD PTR [rsp+0x10],xmm1
movups XMMWORD PTR [rsp],xmm0
call extern_func
not eax
add rsp,0x20
pop rbp
ret
```
aarch64 assembly, with -Oz, -Os, -O2, or -O3
```
tail_call:
sub sp, sp, #0x30
stp x29, x30, [sp, #32]
add x29, sp, #0x20
ldp q0, q1, [x0]
mov x0, sp
stp q0, q1, [sp]
bl extern_func
ldp x29, x30, [sp, #32]
add sp, sp, #0x30
ret
non_tail_call:
sub sp, sp, #0x30
stp x29, x30, [sp, #32]
add x29, sp, #0x20
ldp q0, q1, [x0]
mov x0, sp
stp q0, q1, [sp]
bl extern_func
mvn w0, w0
ldp x29, x30, [sp, #32]
add sp, sp, #0x30
ret
```
riscv64 assembly, with -Oz, -Os, -O2, or -O3
```
tail_call:
addi sp,sp,-48
sd ra,40(sp)
ld a1,24(a0)
ld a2,16(a0)
ld a3,8(a0)
ld a0,0(a0)
sd a1,32(sp)
sd a2,24(sp)
sd a3,16(sp)
sd a0,8(sp)
addi a0,sp,8
auipc ra,0x0
jalr ra # extern_func
ld ra,40(sp)
addi sp,sp,48
ret
non_tail_call:
addi sp,sp,-48
sd ra,40(sp)
ld a1,24(a0)
ld a2,16(a0)
ld a3,8(a0)
ld a0,0(a0)
sd a1,32(sp)
sd a2,24(sp)
sd a3,16(sp)
sd a0,8(sp)
addi a0,sp,8
auipc ra,0x0
jalr ra # extern_func
not a0,a0
ld ra,40(sp)
addi sp,sp,48
ret
```
Only the tail call for x86_64 is optimized semi-correctly, save for the pointless register and stack manipulation prior to the unconditional branch to extern_func.
Please let me know if I should include anything else in this bug report.
[test.c.txt](https://github.com/user-attachments/files/18172937/test.c.txt)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs