While looking at PR42722 I noticed that gcc generates awful code for a
tail-call involving a trivial pass-through of a large struct parameter.
> cat bug1.c
struct s1 { int x[16]; };
extern void g1(struct s1);
void f1(struct s1 s1) { g1(s1); }
struct s2 { int x[17]; };
extern void g2(struct s2);
void f2(struct s2 s2) { g2(s2); }
> gcc -O2 -fomit-frame-pointer -S bug1.c
> cat bug1.s
.file "bug1.c"
.text
.p2align 4,,15
.globl f1
.type f1, @function
f1:
subl $12, %esp
addl $12, %esp
jmp g1
.size f1, .-f1
.p2align 4,,15
.globl f2
.type f2, @function
f2:
subl $12, %esp
movl $17, %ecx
movl %edi, 8(%esp)
leal 16(%esp), %edi
movl %esi, 4(%esp)
movl %edi, %esi
rep movsl
movl 4(%esp), %esi
movl 8(%esp), %edi
addl $12, %esp
jmp g2
.size f2, .-f2
.ident "GCC: (GNU) 4.5.0 20100128 (experimental)"
.section .note.GNU-stack,"",@progbits
There are two problems with this code:
1. For the larger struct gcc generates a block copy with identical source and
destination addresses, which amounts to a very slow NOP.
2. For the smaller struct gcc manages to eliminate the block copy, but it
leaves pointless stack manipulation behind in the function (f1). However,
gcc-4.3 generates no pointless stack manipulation:
.globl f1
.type f1, @function
f1:
jmp g1
.size f1, .-f1
.ident "GCC: (GNU) 4.3.5 20100103 (prerelease)"
so there's a code size and performance regression in 4.5/4.4.
--
Summary: inefficient code for trivial tail-call with large struct
parameter
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: mikpe at it dot uu dot se
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42909