or our runloops are wrong or deep core stuff All run loops get a pointer to the parrot byte code for execution. This has several impacts on the runloop itself and on branching and jumping between instructions. As parrot PASM jumps are expressed by means of opcodes (absolute or relative) all runloops have their individual calculation routines WRT branches.
Proposal: Runloops should get the opcode offset to the start of byte code as param for running not the actual address of the byte code to run. Current typical run loop Proposed ------------------------------------------------------------------ while(pc) while(offs) DO_OP(pc, interpreter); DO_OP(offs, interpreter) Changing this would seem to prohibit a jump to byte code offset 0, this would end the run loop (or it would not even start normally :-) Solution: Instruction 0 in opcode stream is always HALT(), all bytecode start at offset 1. Advantage for e.g. JIT: exiting the runloop is centralized on a defined place. Consideration for individual runloops: runops_fast_core: This is the standard run loop in the absence of CGoto and is exactly above typical runloop: #define DO_OP(PC,INTERP) (PC = ((INTERP->op_func_table)[*PC])(PC,INTERP)) The addressing in the op_func_table would need a change to ------------------------------------------------------------------ offs = Itp->f_tbl[*(code_start+offs)](offs, ..) Seems more expensive but compilers should convert this to some base indexed instruction, it probably depends, how code_start is setup[1] E.g. code_start = interpreter->code->base.data; // new syntax while (offs) offs = interp->func_table[*(code_start+offs)](offs, ..) runops_slow_core: As above. CGoto (cg_core): Similar: ------------------------------------------------------------------ goto *ops_addr[*cur_opcode]; goto *ops_addr[*(code+offs)]; runops_prederef: Similar runloop, but has to do recalculations of offsets forth and back to the two code pointers, it needs. The addressing of operands is done relative to the PC and would then be relative to code_start - so no change (in terms of costs) here: (*(INTVAL *)cur_opcode[1]) = (*(INTVAL *)cur_opcode[2]); return cur_opcode + 3; runops_jit: Addressing is either totally internal or based on offsets, which have to be recalculated every time external (non JITed) code is called, that might cause a control flow change. This is similar to: run_compiled: Has a switch based runloop with offsets, but a linear code representation, i.e. consecutive instructions of one block don't have a runloop lookup, they are just appended (giving a bigger runtime image but faster execution). Can't do forward jumps and restart operations including eval. ------------------------------------------------------------------ switch(cur_opcode - start_code) switch(offset) Special: invoke vtable call This too has a pointer to the byte code of the current run loop. Due to the dynamic nature of of pmc->vtable->invoke, this instruction might be intended to go anywhere. This is currently done e.g. for eval, with nasty tricks (leaving the inner run loop, then switching code segments, reentering the runloop). Passing the offset of the branch address (and adjusting) code_start would then be enough to go anywhere in byte code. [1] Execution speed: I did a short test and changed runops_fast_core, to simulate addressing relative to code start: #undef DO_OP # define DO_OP(PC,INTERP) \ (PC = ((INTERP->op_func_table)[*(_cs+PC)])(PC,INTERP)) opcode_t * runops_fast_core(struct Parrot_Interp *interpreter, opcode_t *pc) { int _cs = 0; // interpreter->code->base.data while (pc) { DO_OP(pc, interpreter); } return pc; } There is *no* remarkable change of execution speed in the mops program. Summary: All addressing in PASM is done in terms of opcodes, addressing in the runloops is done via absolute code pointers. This makes it necessary to recalc opcode offsets for each branch that leaves the runloop or after return from such external code. Changing the addressing scheme to opcode offsets relative to code start would simplify all kinds of (non local) control flow changes. As real world programs mostly consists of such subroutine calls, these would be simplified a lot (and would then not need leaving the runloop - probably ;-) The "fast" run loops (compiled C and JIT) would take most advantage of this change. Comments welcome leo