https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #6 from Petr Skocik <pskocik at gmail dot com> ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Petr Skocik from comment #1)
> > Sidenote regarding the stack-allocating code for cases when the size is not
> > known to be less than pagesize: the code generated for those cases is quite
> > large. It could be replaced (at least under -Os) with a call to a special
> > assembly function that'd pop the return address (assuming the target machine
> > pushes return addresses to the stack), allocate adjust and allocate the
> > stack size in a piecemeal fashion so as to not skip guard pages, the repush
> > the return address and return to caller with the stacksize expanded.
> 
> You certainly don't want to kill the return stack the CPU has, even if it
> results in a few saved bytes for -Os.

That's a very interesting point  because I have written x86_64 assembly
"functions" that  did pop the return address, pushed something to the stack,
and then repushed the return address and returned. In a loop, it doesn't seem
to perform badly compared to inline code, so I figure it shouldn't be messing
with the return stack buffer. After all, even though the return happens through
a different place in the callstack, it's still returning to the original
caller. The one time I absolutely must have accidentally messed with the return
stack buffer was when I wrote context switching routine and originally tried to
"ret" to the new context. It turned out to be very measurably many times slower
that `pop %rcx; jmp *%rcx;` (also measured on a loop), so that's why I think
popping a return address, allocating on the stack, and then pushing and
returning is not really a performance killer (on my Intel CPU anyway). If it
was messing with the return stack buffer, I think would be getting  similar
slowdowns to what I got with context switching code trying to `ret`.

Reply via email to