http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47815

--- Comment #3 from Adam Warner <adam at consulting dot net.nz> 2011-02-19 
13:55:43 UTC ---
OK I finally understand. Tail call optimisation also disappears when the
noreturn attribute is added to the leaf functions when compiled with gcc-4.5.

>From my perspective this is a bug. A call to a function that does not return is
a clear candidate for turning a call into a jump. At high levels of
optimisation this should always trump any extra ease of debugging.

Additionally it is important that adding debugging statements changes the main
code as little as possible. Otherwise it may become very difficult to determine
what is wrong using debugging statements if the debugging statements are the
cause of differently generated code.

If there is a printf in tail_call0() instead of an assert that evaluates to
false then make_tail_calls() generates a jump to tail_call0(). If I add an
assert statement to see what is going on then there is a chance different code
is generated for make_tail_calls() depending on whether GCC can statically
determine if tail_call0() does not return.

You are likely to create a situation were adding debugging statements causes a
bug to change or even disappear. This can be the most infuriating kind of bug.
So I don't accept that this will always help with debugging. There's no point
having an accurate backtrace of the wrong code.

Thirdly I can eliminate many stack alignment instructions with tail calls. A
call instruction pushes the return address on the stack causing a 16-byte
aligned stack to become misaligned. The parent function compensates by
including a stack alignment instruction. A jump does not cause 16-byte stack
misalignment.

You are generating inferior code. With newer versions of gcc make_tail_calls()
includes a stack alignment instruction (push %rcx at -Os). This will be the
case even if a non-returning function is only called once. Here is an example:

$ cat no_tail_call_optimisation.c 
#include <assert.h>

__attribute__((noinline)) void is_complete_helper() {
  assert("complete"=="");
}

__attribute__((noinline)) void is_complete(unsigned int i) {
  if(i==0) is_complete_helper();
}

int main() {
  for (unsigned int i=3000000000; ;--i) {
    is_complete(i);
  }
}

$ gcc-4.5 -std=gnu99 -Os no_tail_call_optimisation.c && time ./a.out
a.out: no_tail_call_optimisation.c:4: is_complete_helper: Assertion
`"complete"==""' failed.
Aborted

real    0m8.014s
user    0m8.009s
sys    0m0.000s

$ gcc-4.6 --version
gcc-4.6 (Debian 4.6-20110216-1) 4.6.0 20110216 (experimental) [trunk revision
170225]

$ gcc-4.6 -std=gnu99 -Os no_tail_call_optimisation.c && time ./a.out
a.out: no_tail_call_optimisation.c:4: is_complete_helper: Assertion
`"complete"==""' failed.
Aborted

real    0m10.015s
user    0m10.009s
sys    0m0.000s

So why does the version compiled with gcc-4.6 take two seconds longer to run?
Compare the code generated for is_complete():

gcc-4.5:
0000000000400511 <is_complete>:
  400511:       85 ff                   test   %edi,%edi
  400513:       75 07                   jne    40051c <is_complete+0xb>
  400515:       31 c0                   xor    %eax,%eax
  400517:       e9 d8 ff ff ff          jmpq   4004f4 <is_complete_helper>
  40051c:       c3                      retq  

gcc-4.6:
000000000040051e <is_complete>:
  40051e:       85 ff                   test   %edi,%edi
  400520:       51                      push   %rcx
  400521:       75 07                   jne    40052a <is_complete+0xc>
  400523:       31 c0                   xor    %eax,%eax
  400525:       e8 da ff ff ff          callq  400504 <is_complete_helper>
  40052a:       5a                      pop    %rdx
  40052b:       c3                      retq   

6,000,000,000 additional push/pop instructions are executed with gcc-4.6. These
stack alignment instructions are generated because a known tail call
optimisation has been ignored for spurious reasons.

Tail call optimisation should be the default at high levels of optimisation
regardless of whether a function returns. This bug only manifests more often in
gcc-4.6 because of superior code inference.

Regards,
Adam

Reply via email to