------- Comment #3 from rguenth at gcc dot gnu dot org 2010-04-25 20:03 ------- Well, the innermost loop with current trunk is
.L3: leal -1(%ebx), %eax subl $2, %ebx movl %eax, (%esp) call fib addl %eax, %esi cmpl $2, %ebx jg .L3 which is pretty much optimal. The intel compiler doesn't detect the tail-recursion (huh) but has multiple entry-points into the function and uses register passing conventions for the recursions. With -fwhole-program GCC does the same (or with static fib), and we then end up with a program faster than what ICC produces (16s) A 4.3 compiled version is indeed a bit faster (as fast as 4.4 on i?86, 15.4s). A 4.1 compiled version is even faster (14.1s), the 3.4 baseline is 21.5s. That's on i?86-linux, all -O2. 4.1 assembly, fib is not inlined: fib: pushl %esi pushl %ebx movl %eax, %ebx cmpl $2, %ebx movl $1, %eax jle .L5 xorl %esi, %esi .p2align 4,,7 .L6: leal -1(%ebx), %eax subl $2, %ebx call fib addl %eax, %esi cmpl $2, %ebx jg .L6 leal 1(%esi), %eax .L5: popl %ebx popl %esi ret trunk assembler: fib: pushl %esi pushl %ebx movl %eax, %ebx subl $4, %esp cmpl $2, %ebx movl $1, %eax jle .L2 xorl %esi, %esi .p2align 4,,7 .p2align 3 .L3: leal -1(%ebx), %eax subl $2, %ebx call fib addl %eax, %esi cmpl $2, %ebx jg .L3 leal 1(%esi), %eax .L2: addl $4, %esp popl %ebx popl %esi ret where the only difference is different loop alignment and keeping the stack 16-bytes aligned. Indeed we get the same speed as 4.1 when building with -mpreffered-stack-boundary=2. Why do we bother to keep the stack aligned for leaf functions? -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hjl at gcc dot gnu dot org, | |hubicka at gcc dot gnu dot | |org Component|c++ |target GCC target triplet| |i?86-*-* Keywords| |missed-optimization Known to work| |4.1.3 Summary|[4.4/4.5 Regression] |[4.4/4.5/4.6 Regression] |Performance degradation for |Performance degradation for |simple fibonacci numbers |simple fibonacci numbers |calculation |calculation due to extra | |stack alignment Target Milestone|--- |4.4.4 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43884