https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116521
Bug ID: 116521
Summary: missing optimization: xtensa tail-call
Product: gcc
Version: 12.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rsaxvc at gmail dot com
Target Milestone: ---
On GCC 12.2.0, -O2 -Wall -Wextra, the following code:
#include <stdint.h>
__attribute__ ((noinline)) uint32_t callee(uint32_t x, uint16_t y){
return x + y;
}
__attribute__ ((noinline)) uint32_t caller(uint32_t x, uint32_t y){
return callee(x, y);
}
compiles to these xtensa instructions:
callee:
entry sp, 32
extui a3, a3, 0, 16
add.n a2, a3, a2
retw.n
caller:
entry sp, 32
extui a11, a3, 0, 16
mov.n a10, a2
call8 callee
mov.n a2, a10
retw.n
If the caller were to tail-call callee, it could be a lot closer to the
following on ARM(basically, caller does not need to manipulate the register
windows):
callee:
add r0, r0, r1
bx lr
caller:
uxth r1, r1 //similar to extui, .., .., 0, 16
b callee
On xtensa, this might mean that the arguments are in different registers in
caller(), I'm not sure if the caller or callee is responsible for rotating the
window. This may only apply when the number of arguments of each match. It's
also possible I'm misunderstanding the mechanism.