http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47815
--- Comment #3 from Adam Warner <adam at consulting dot net.nz> 2011-02-19 13:55:43 UTC --- OK I finally understand. Tail call optimisation also disappears when the noreturn attribute is added to the leaf functions when compiled with gcc-4.5. >From my perspective this is a bug. A call to a function that does not return is a clear candidate for turning a call into a jump. At high levels of optimisation this should always trump any extra ease of debugging. Additionally it is important that adding debugging statements changes the main code as little as possible. Otherwise it may become very difficult to determine what is wrong using debugging statements if the debugging statements are the cause of differently generated code. If there is a printf in tail_call0() instead of an assert that evaluates to false then make_tail_calls() generates a jump to tail_call0(). If I add an assert statement to see what is going on then there is a chance different code is generated for make_tail_calls() depending on whether GCC can statically determine if tail_call0() does not return. You are likely to create a situation were adding debugging statements causes a bug to change or even disappear. This can be the most infuriating kind of bug. So I don't accept that this will always help with debugging. There's no point having an accurate backtrace of the wrong code. Thirdly I can eliminate many stack alignment instructions with tail calls. A call instruction pushes the return address on the stack causing a 16-byte aligned stack to become misaligned. The parent function compensates by including a stack alignment instruction. A jump does not cause 16-byte stack misalignment. You are generating inferior code. With newer versions of gcc make_tail_calls() includes a stack alignment instruction (push %rcx at -Os). This will be the case even if a non-returning function is only called once. Here is an example: $ cat no_tail_call_optimisation.c #include <assert.h> __attribute__((noinline)) void is_complete_helper() { assert("complete"==""); } __attribute__((noinline)) void is_complete(unsigned int i) { if(i==0) is_complete_helper(); } int main() { for (unsigned int i=3000000000; ;--i) { is_complete(i); } } $ gcc-4.5 -std=gnu99 -Os no_tail_call_optimisation.c && time ./a.out a.out: no_tail_call_optimisation.c:4: is_complete_helper: Assertion `"complete"==""' failed. Aborted real 0m8.014s user 0m8.009s sys 0m0.000s $ gcc-4.6 --version gcc-4.6 (Debian 4.6-20110216-1) 4.6.0 20110216 (experimental) [trunk revision 170225] $ gcc-4.6 -std=gnu99 -Os no_tail_call_optimisation.c && time ./a.out a.out: no_tail_call_optimisation.c:4: is_complete_helper: Assertion `"complete"==""' failed. Aborted real 0m10.015s user 0m10.009s sys 0m0.000s So why does the version compiled with gcc-4.6 take two seconds longer to run? Compare the code generated for is_complete(): gcc-4.5: 0000000000400511 <is_complete>: 400511: 85 ff test %edi,%edi 400513: 75 07 jne 40051c <is_complete+0xb> 400515: 31 c0 xor %eax,%eax 400517: e9 d8 ff ff ff jmpq 4004f4 <is_complete_helper> 40051c: c3 retq gcc-4.6: 000000000040051e <is_complete>: 40051e: 85 ff test %edi,%edi 400520: 51 push %rcx 400521: 75 07 jne 40052a <is_complete+0xc> 400523: 31 c0 xor %eax,%eax 400525: e8 da ff ff ff callq 400504 <is_complete_helper> 40052a: 5a pop %rdx 40052b: c3 retq 6,000,000,000 additional push/pop instructions are executed with gcc-4.6. These stack alignment instructions are generated because a known tail call optimisation has been ignored for spurious reasons. Tail call optimisation should be the default at high levels of optimisation regardless of whether a function returns. This bug only manifests more often in gcc-4.6 because of superior code inference. Regards, Adam