When looking at the inner loop of mops.pasm by far the most time is used for accessing the parrot registers.

Some results (-O3 compiled except run_ops_cg.c, Athlon 800, i386/linux):

CVS »micro_ops«
-g (fast_core) 24 117
cgoto_core: 19 205
-j (JIT) 782

So I hacked together a modified core.ops and modified mops.pasm to look something like this near the loop:

arg1 4
arg2 4
arg3 3
REDO: sub I4, I4, I3
setarg 2, 1
if I4, REDO
larg1 4

With this code in core.ops:
arg1:
arg1 = IREG($1); // fetch register $1 to global arg1
arg2:
arg2 = IREG($1)
arg3:
arg3 = IREG($1)
setarg 2,1:
arg2 = arg1 // set global arg2 = arg1
larg1:
IREG($1) = arg1 // store arg1 to register $1
if_i_ic:
if (arg1 != 0)
...
sub_i_i_i:
arg1 = arg2 - arg3

and
extern int arg1, arg2, arg3;
defined in core_*ops.c

So the I-register access is substituted by access to 3 global integers.

Now, how would these globals be loaded? When are these »arg« OPs inserted?

Currently the register optimizer in jit.c does something very similar: Setting up register access for the most used parrot registers in one execution block + load and store add block begin/end.

Pulling out the arg opcodes from the loop is something imcc should do in look optimization as well as removing unnecessary load/store operations.
All the needed things like life analysis, basic blocks and loop detection are already there and are working.

leo

Reply via email to