RE: Opcode Dispatch

Hong Zhang Mon, 06 Aug 2001 14:07:06 -0700

Actually we can use "call-setup-gp" calling convention to avoid patch.

It works like this:

1) each bytecode contains data section and code section.

2) during load, the runtime construct the data segment from the
  data section, such as string object from string data, floating
  point object from raw bits, external references etc. The data
  segments come from heap, they are not part of image.

  NOTE: it will not touch anything in the data section. The code 
  section will leave as is.

3) the data segment ptr is always part of stack frame, just like
  sp, pc, dp(data ptr), fp (frame ptr), cp(code ptr). It will be
  setup for each stack frame automatically.

4) byte code should be 100% position independent and data 
  independent. Code reference is done relative branches. Data 
  reference is done by referencing to slots in the data segment
  of current stack frame, be it mutable or immutable.

5) for all cross module/package reference, we need to maintain
  two pointers instead of one inside the data segment. For data
  reference, the first pointer is the data segment ptr, the 
  second one will be offset. For code reference, the first pointer
  will still be the data segment, the second pointer is the ptr
  to code section. These are setup during phase 2.

6) to call external functions, the caller must provide both dp 
  and cp to construct the new stack frame. In C/C++ land, the
  dp is called gp (global ptr).

With this convention, we don't have to patch nothing, period.
However, we need to allocate and initialize the data segment,
instead of in-place patching. The up-side is we can use compress
and platform independent data format for data section, and don't
worry about sizes. For example, we can use BER format for ints.
With this scheme, we can have share the same image between multi
interpreters inside one address space.

By the way, this calling convention is used by Intel Itanium.

Hong

> Perl bytecode will have three sections:
> 
> 1) Fixup section. RW. This has all the real-address pointers 
> and suchlike 
> things stored in it. It will  be abused as needed when 
> bytecode is loaded, 
> and all the bytecode that needs to deal with things will use 
> fixed-position 
> slots in the fixup section to vector to the real things.
> 
> 2) Constants section. RO. Holds constants. (I bet that was a 
> surprise... :) 
> Things like string data and integer constants and such. The 
> loader mangles 
> the fixup section to point to constants here.
> 
> 3) Instruction section. RO. Holds the actual bytecode. 
> Everything here's 
> position independent--it either refers to things relative to 
> the current 
> location (for branches within code, for example), in the 
> fixup section, or 
> symbolically.
> 
> The constants and instruction sections might be write-once if the 
> bytecode's not in platform-native integer/float format. 
> (Wrong endianness 
> or floating point format mainly) If we need to, we'll read in 
> and byteswap 
> the whole thing, and then leave it read-only. We'd rather 
> not, though, if 
> we can manage, since that means we need to touch, and I hate touching.
> 
> There might be separate source, syntax tree, and unoptimized bytecode 
> sections as well, but generally they'd stay on disk. (We 
> might overlap the 
> constants and source sections so that if you had some monster string 
> constant we wouldn't have two copies--the one in your source 
> and the one in 
> the constants section)
> 
>                                       Dan
> 
> --------------------------------------"it's like 
> this"-------------------
> Dan Sugalski                          even samurai
> [EMAIL PROTECTED]                         have teddy bears and even
>                                       teddy bears get drunk
>
RE: Opcode Dispatch

Reply via email to