Issue 120339
Summary Redundant Copying of Large Struct Parameter to Stack When Passed to Another Function
Labels new issue
Assignees
Reporter jonathan-gruber-jg
    When passing a large struct as an argument to a function, and the calling function already has the large struct as a parameter, Clang redundantly copies the struct parameter to the stack.

A minimal test case is in the attached file test.c.txt (GitHub would not allow me to upload it with the .c extension, sadly), reproduced below for your convenience:
```
struct S {
	void *x, *y, *z, *w;
};

extern int extern_func(struct S);

int tail_call(struct S x) {
	return extern_func(x);
}

int non_tail_call(struct S x) {
	return ~extern_func(x);
}
```

I only tested the target architectures x86_64, aarch64, and riscv64, but I would not be surprised if other target architectures exhibit the same inefficiency.

Host system: Arch Linux, x86_64.

Clang version: official Arch Linux package of clang, version 18.1.8-4.

Command line to reproduce results: clang -c test.c --target=<arch> -O<opt-level>

x86_64 assembly (Intel syntax), with -Oz, -Os, -O2, or -O3
```
tail_call:
    push rbp
    mov  rbp,rsp
    pop rbp
    jmp  extern_func

non_tail_call:
    push   rbp
    mov rbp,rsp
    sub    rsp,0x20
    movaps xmm0,XMMWORD PTR [rbp+0x10]
 movaps xmm1,XMMWORD PTR [rbp+0x20]
    movups XMMWORD PTR [rsp+0x10],xmm1
 movups XMMWORD PTR [rsp],xmm0
    call   extern_func
    not    eax
 add    rsp,0x20
    pop    rbp
    ret
```

aarch64 assembly, with -Oz, -Os, -O2, or -O3
```
tail_call:
    sub sp, sp, #0x30
    stp x29, x30, [sp, #32]
    add x29, sp, #0x20
    ldp q0, q1, [x0]
    mov x0, sp
 stp q0, q1, [sp]
    bl  extern_func
    ldp x29, x30, [sp, #32]
    add sp, sp, #0x30
    ret

non_tail_call:
    sub sp, sp, #0x30
    stp x29, x30, [sp, #32]
    add x29, sp, #0x20
    ldp q0, q1, [x0]
    mov x0, sp
    stp q0, q1, [sp]
    bl  extern_func
    mvn w0, w0
    ldp x29, x30, [sp, #32]
    add sp, sp, #0x30
    ret
```

riscv64 assembly, with -Oz, -Os, -O2, or -O3
```
tail_call:
    addi  sp,sp,-48
    sd ra,40(sp)
    ld    a1,24(a0)
    ld    a2,16(a0)
    ld    a3,8(a0)
 ld    a0,0(a0)
    sd    a1,32(sp)
    sd    a2,24(sp)
    sd a3,16(sp)
    sd    a0,8(sp)
    addi  a0,sp,8
    auipc ra,0x0
    jalr ra # extern_func
    ld    ra,40(sp)
    addi  sp,sp,48
 ret

non_tail_call:
    addi  sp,sp,-48
    sd    ra,40(sp)
    ld a1,24(a0)
    ld    a2,16(a0)
    ld    a3,8(a0)
    ld    a0,0(a0)
 sd    a1,32(sp)
    sd    a2,24(sp)
    sd    a3,16(sp)
    sd a0,8(sp)
    addi  a0,sp,8
    auipc ra,0x0
    jalr  ra # extern_func
 not   a0,a0
    ld    ra,40(sp)
    addi  sp,sp,48
    ret
```

Only the tail call for x86_64 is optimized semi-correctly, save for the pointless register and stack manipulation prior to the unconditional branch to extern_func.

Please let me know if I should include anything else in this bug report.

[test.c.txt](https://github.com/user-attachments/files/18172937/test.c.txt)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to