When looking at the inner loop of mops.pasm by far the most time is used
for accessing the parrot registers.
Some results (-O3 compiled except run_ops_cg.c, Athlon 800, i386/linux):
CVS »micro_ops«
-g (fast_core) 24 117
cgoto_core:
19
205
-j (JIT) 782
So I hacked together a modified core.ops and modified mops.pasm to look
something like this near the loop:
arg1 4
arg2 4
arg3 3
REDO: sub I4, I4, I3
setarg 2, 1
if I4, REDO
larg1 4
With this code in core.ops:
arg1:
arg1 = IREG($1); // fetch register $1 to global arg1
arg2:
arg2 = IREG($1)
arg3:
arg3 = IREG($1)
setarg 2,1:
arg2 = arg1 // set global arg2 = arg1
larg1:
IREG($1) = arg1 // store arg1 to register $1
if_i_ic:
if (arg1 != 0)
...
sub_i_i_i:
arg1 = arg2 - arg3
and
extern int arg1, arg2, arg3;
defined in core_*ops.c
So the I-register access is substituted by access to 3 global integers.
Now, how would these globals be loaded? When are these »arg« OPs inserted?
Currently the register optimizer in jit.c does something very similar:
Setting up register access for the most used parrot registers in one
execution block + load and store add block begin/end.
Pulling out the arg opcodes from the loop is something imcc should do in
look optimization as well as removing unnecessary load/store operations.
All the needed things like life analysis, basic blocks and loop
detection are already there and are working.
leo
- Re: Of mops and microops Leopold Toetsch
- Re: Of mops and microops Dan Sugalski
- Re: Of mops and microops Leopold Toetsch
- Re: Of mops and microops Nicholas Clark
- Re: Of mops and microops Leopold Toetsch