Actually we can use "call-setup-gp" calling convention to avoid patch.
It works like this:
1) each bytecode contains data section and code section.
2) during load, the runtime construct the data segment from the
data section, such as string object from string data, floating
point object from raw bits, external references etc. The data
segments come from heap, they are not part of image.
NOTE: it will not touch anything in the data section. The code
section will leave as is.
3) the data segment ptr is always part of stack frame, just like
sp, pc, dp(data ptr), fp (frame ptr), cp(code ptr). It will be
setup for each stack frame automatically.
4) byte code should be 100% position independent and data
independent. Code reference is done relative branches. Data
reference is done by referencing to slots in the data segment
of current stack frame, be it mutable or immutable.
5) for all cross module/package reference, we need to maintain
two pointers instead of one inside the data segment. For data
reference, the first pointer is the data segment ptr, the
second one will be offset. For code reference, the first pointer
will still be the data segment, the second pointer is the ptr
to code section. These are setup during phase 2.
6) to call external functions, the caller must provide both dp
and cp to construct the new stack frame. In C/C++ land, the
dp is called gp (global ptr).
With this convention, we don't have to patch nothing, period.
However, we need to allocate and initialize the data segment,
instead of in-place patching. The up-side is we can use compress
and platform independent data format for data section, and don't
worry about sizes. For example, we can use BER format for ints.
With this scheme, we can have share the same image between multi
interpreters inside one address space.
By the way, this calling convention is used by Intel Itanium.
Hong
> Perl bytecode will have three sections:
>
> 1) Fixup section. RW. This has all the real-address pointers
> and suchlike
> things stored in it. It will be abused as needed when
> bytecode is loaded,
> and all the bytecode that needs to deal with things will use
> fixed-position
> slots in the fixup section to vector to the real things.
>
> 2) Constants section. RO. Holds constants. (I bet that was a
> surprise... :)
> Things like string data and integer constants and such. The
> loader mangles
> the fixup section to point to constants here.
>
> 3) Instruction section. RO. Holds the actual bytecode.
> Everything here's
> position independent--it either refers to things relative to
> the current
> location (for branches within code, for example), in the
> fixup section, or
> symbolically.
>
> The constants and instruction sections might be write-once if the
> bytecode's not in platform-native integer/float format.
> (Wrong endianness
> or floating point format mainly) If we need to, we'll read in
> and byteswap
> the whole thing, and then leave it read-only. We'd rather
> not, though, if
> we can manage, since that means we need to touch, and I hate touching.
>
> There might be separate source, syntax tree, and unoptimized bytecode
> sections as well, but generally they'd stay on disk. (We
> might overlap the
> constants and source sections so that if you had some monster string
> constant we wouldn't have two copies--the one in your source
> and the one in
> the constants section)
>
> Dan
>
> --------------------------------------"it's like
> this"-------------------
> Dan Sugalski even samurai
> [EMAIL PROTECTED] have teddy bears and even
> teddy bears get drunk
>