Clifford Wolf wrote:



hmm.. what's about doing it gc-like. Instead of a stack there simply is a
'pool' of trampolines from which trampolines are allocated and a pointer to
the trampoline is pushed on the stack.

When the last trampoline from the pool is allocated, a 'garbage collector'
is running over it and looking for pointers to trampolines between the
stack pointer and the stack start address. Every trampoline which isn't
possibly referenced is added to a free-list from which new trampolines are
allocated.

If you have only one procesor stack (i.e. single-threaded execution), you can handle
the trampolines as a stack too. You don't need to deallocate till you allocate again,
and then you adjust the trampoline stack so none of its static chain pointers points to
a deallocated frame, or to the current frame (since you are only about to set up the
trampolines for the current frame then).


If you have multiple processor stacks, you have to register and later search them all in
order to make the garbage-collection scheme work.
that it doesn't point at any deallocated frames.


Instead of adding the trampoline pool to libgcc (as suggested earlier in
this thread) I would suggest that gcc generates a trampoline pool in a
linkonce section every time a source file is compiled which requires
trampolines. That way there wouldn't be any trampoline pool in an
executeable which doesn't need one


You don't need a linkonce section for this. The function that needs a trampoline
calls allocation / deallocation functions, or if it inlines the code, it will reference
the pool start addresses - either way, it will reference some symbols. By putting the
.o file that provides these symbols along with the code and data parts of the trampoline
pool into a static library - libgcc.a or otherwise - you make sure that the object is only
linked in when needed.


and a compiler option such as
-ftrampoline-pool-size=32 could be used the specify the size of the
trampoline pool on the command line.

This is messy; say you have two libraries that are compiled with -ftrampoline-pool-size=32 ;
they will then share a trampoline pool of 32 entries. If you compile one with
-ftrampoline-pool-size=16 instead, you will have them using different pools, or maybe
even get some multiply defined symbols.
It is much saner to make this a link time option. By selecting a specific library for the
trampoline pool, you can adjust the size on a program (or dso, you you don't export)
basis, and you might even choose an alternate allocation strategy. I.e. you could have
libgcc provide one with a size that works most of the time and uses destructors for
portabiliyt and robustness, have a specialized lightweight one you can specifically use for
single-threaded programs, and have a 64 bit linux specific one that ties into the threading code
(or is part of a threads package) and mmaps trampoline code pages for every processor stack
allocated, sufficiently large and at a fixed offset to the stack so that you can put the data part
on the return stack in any suitably aligned position, and have a matching trampoline.


I.e. the bare function address and the static chain pointer are 8 bytes each, so that a trampoline
data part is 16 bytes. You require them to be 16-byte aligned on any processor stack.
The mmapped trampoline can be an absolute function call to some helper code that does the
real work, using the return address to figure out which trampoline is executed. This call should
fit into 16 bytes too, so in the trampoline page to be mmapped , every 16 bytes there is such an
absolute call insn. You can get a 1:1 correspondence between trampolines and processor stacks
by allocating the stacks all in one specific memory area, and have an equally-sized area where
trampolines are mapped. Thus, you can have differently-sized stacks, yet the trampoline code
can add a constant offset to the return address to find the data part of the trampoline.





Reply via email to