or our runloops are wrong
  or deep core stuff

All run loops get a pointer to the parrot byte code for execution.
This has several impacts on the runloop itself and on branching and
jumping between instructions.
As parrot PASM jumps are expressed by means of opcodes (absolute or
relative) all runloops have their individual calculation routines WRT
branches.

Proposal:

Runloops should get the opcode offset to the start of byte code as
param for running not the actual address of the byte code to run.

Current typical run loop         Proposed
------------------------------------------------------------------

  while(pc)                       while(offs)
    DO_OP(pc, interpreter);         DO_OP(offs, interpreter)

Changing this would seem to prohibit a jump to byte code offset 0, this
would end the run loop (or it would not even start normally :-)
Solution: Instruction 0 in opcode stream is always HALT(), all
bytecode start at offset 1.
Advantage for e.g. JIT: exiting the runloop is centralized on a defined
place.

Consideration for individual runloops:

runops_fast_core:

This is the standard run loop in the absence of
CGoto and is exactly above typical runloop:

#define DO_OP(PC,INTERP)
 (PC = ((INTERP->op_func_table)[*PC])(PC,INTERP))

The addressing in the op_func_table would need a change to

------------------------------------------------------------------
                       offs = Itp->f_tbl[*(code_start+offs)](offs, ..)

Seems more expensive but compilers should convert this to some base
indexed instruction, it probably depends, how code_start is setup[1]
E.g.

 code_start = interpreter->code->base.data; // new syntax
 while (offs)
      offs = interp->func_table[*(code_start+offs)](offs, ..)

runops_slow_core:
As above.

CGoto (cg_core):
Similar:

------------------------------------------------------------------
goto *ops_addr[*cur_opcode];      goto *ops_addr[*(code+offs)];


runops_prederef:

Similar runloop, but has to do recalculations of offsets forth and
back to the two code pointers, it needs. The addressing of operands is
done relative to the PC and would then be relative to code_start - so
no change (in terms of costs) here:

  (*(INTVAL *)cur_opcode[1]) = (*(INTVAL *)cur_opcode[2]);
   return cur_opcode + 3;

runops_jit:

Addressing is either totally internal or based on offsets, which have
to be recalculated every time external (non JITed) code is called,
that might cause a control flow change. This is similar to:

run_compiled:

Has a switch based runloop with offsets, but a linear code
representation, i.e. consecutive instructions of one block don't have
a runloop lookup, they are just appended (giving a bigger runtime image
but faster execution). Can't do forward jumps and restart operations
including eval.

------------------------------------------------------------------
 switch(cur_opcode - start_code)   switch(offset)


Special: invoke vtable call

This too has a pointer to the byte code of the current run loop. Due
to the dynamic nature of of pmc->vtable->invoke, this instruction
might be intended to go anywhere. This is currently done e.g. for
eval, with nasty tricks (leaving the inner run loop, then switching
code segments, reentering the runloop).

Passing the offset of the branch address (and adjusting) code_start
would then be enough to go anywhere in byte code.

[1] Execution speed:

I did a short test and changed runops_fast_core, to simulate
addressing relative to code start:

#undef DO_OP
#  define DO_OP(PC,INTERP) \
    (PC = ((INTERP->op_func_table)[*(_cs+PC)])(PC,INTERP))

opcode_t *
runops_fast_core(struct Parrot_Interp *interpreter, opcode_t *pc)
{
    int _cs = 0;	// interpreter->code->base.data
    while (pc) {
        DO_OP(pc, interpreter);
    }
    return pc;
}

There is *no* remarkable change of execution speed in the mops program.


Summary:

All addressing in PASM is done in terms of opcodes, addressing in the
runloops is done via absolute code pointers. This makes it necessary
to recalc opcode offsets for each branch that leaves the runloop or
after return from such external code.

Changing the addressing scheme to opcode offsets relative to code
start would simplify all kinds of (non local) control flow changes. As
real world programs mostly consists of such subroutine calls, these
would be simplified a lot (and would then not need leaving the runloop
- probably ;-)

The "fast" run loops (compiled C and JIT) would take most advantage of
this change.

Comments welcome
leo

Reply via email to