https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116329
Bug ID: 116329 Summary: Arm M0+ doesn't do tail-call optimization Product: gcc Version: 13.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: terrygreeniaus at gmail dot com Target Milestone: --- Godbolt link: https://godbolt.org/z/9vMTzx4dq Building with -mcpu=cortex-m0plus on gcc 13.3.1 shows that gcc doesn't perform tail-call optimization: #include <stdint.h> uint32_t x; void __attribute__((noinline)) foo() { x = 1; } void bar() { foo(); } Disassembles as: foo(): movs r2, #1 ldr r3, .L3 str r2, [r3] bx lr .L3: .word .LANCHOR0 bar(): push {r4, lr} bl foo() pop {r4, pc} x: .space 4 Compiling with -mcpu=cortex-m4 does the right thing: foo(): ldr r3, .L3 movs r2, #1 str r2, [r3] bx lr .L3: .word .LANCHOR0 bar(): b foo() x: .space 4 I purposely made the code not just trivially call an extern function in case there was an issue with M0+ not having wide enough instructions to just branch anywhere; in this contrived example it only needs to branch back a little bit so should have no problem with the direct branch. I observed this with arm-none-eabi-gcc 13.3.1, but also experimenting in Godbolt shows that it exists in ARM GCC trunk.