Leo --

Ah. It seems the point of divergence is slow_core vs. cg_core, et al.

As you have figured out, I've been referring to performance of the non-cg, 
non-prederef, non-JIT (read: "slow"   ;) core.

I don't know much about the CG core, but prederef and JIT should be able 
to work with dynamic optables. For prederef and JIT, optable mucking does 
expire your prederefed and JITted blocks (in general), but for 
conventional use (preamble setup), you don't pay a price during mainline 
execution once you've set up your optable. You only pay an additional cost 
if your program is dynamic enough to muck with its optable in the middle 
somewhere, so you have to pay to re-prederef or re-JIT stuff (and a use 
tax like that seems appropriate to me).

Of all the cores, the CG core is the most "crystalized" (rigid), so it 
stands to reason that it would not be a good match for dynamic optables.

While I don't think I'm sophisticated enough to pull it off on my own, I 
do think it should be possible to use what was learned to build the JIT 
system to build the equivalent of a CG core on the fly, given its 
structure. I think the information and basic capabilities are already 
there: The JIT system knows already how to compile a sequence of ops to 
machine code -- using this plus enough know-how to plop in the right JMP 
instructions pretty much gets you there. A possible limitation to the 
coolness, here: I think the JIT system bails out for the non-inline ops 
and just calls the opfunc (please forgive if my understanding of what JIT 
does and doesn't do is out of date). I think the CG core doesn't have to 
take the hit of that extra indirection for non-inline ops. If so, then the 
hypothetical dynamic core construction approach just described would 
approach the speed of the CG core, but would fall somewhat short on 
workloads that involve lots of non-inline ops (FWIW, there are more inline 
ops than not in the current *.ops files).

Then, you get CG (-esque) speed along with the dynamic capabilities. Its 
cheating, to be sure, but I like that kind of cheating.    :)  Further, 
DCC would work with dynamically loaded oplibs (presumably using purely the 
JIT-func-call technique, although I suppose its possible to do even 
better), where the CG core would not.

It would be interesting to see where DCC would fit on the performance 
spectrum compared to JIT, for mops.pasm and for other examples with 
broader op usage...


Regards,

-- Gregor





Leopold Toetsch <[EMAIL PROTECTED]>
11/04/2002 08:45 AM
 
        To:     [EMAIL PROTECTED]
        cc:     Brent Dax <[EMAIL PROTECTED]>, "'Andy Dougherty'" 
<[EMAIL PROTECTED]>, Josh Wilmes <[EMAIL PROTECTED]>, "'Perl6 
Internals'" <[EMAIL PROTECTED]>
        Subject:        Re: Need for fingerprinting? [was: Re: What to do if 
Digest::MD5 is 
unavailable?]


[EMAIL PROTECTED] wrote:

> Leo --
> 
> ... Optable build time is not a function of program 
> size, but rather of optable size


Ok, I see that, but ...


> I don't think it remains a problem how to run ops from different oplibs 
> _fast_. 


.... the problem is, that as soon as there are dynamic oblibs, they can't 
be run in the CGoto core, which is normally the fastest core, when 
executions time is depending on opcode dispatch time. JIT is (much) 
faster, in almost integer only code, e.g. mops.pasm, but for more 
complex programs, involving PMCs, JIT is currently slower.

> ... Op lookup is already fast ...


I rewrote find_op, to build a lookup hash at runtime, when it's needed. 
This is 2-3 times faster then the find_op with the static lookup table 
in the core_ops.c file.


> ... After the 
> preamble, while the program is running, the cost of having a dynamic 
> optable is absolutely *nil*, whether the ops in question were statically 

> or dynamically loaded (if you don't see that, then either I'm very 
wrong, 
> or I haven't given you the right mental picture of what I'm talking 
> about).


The cost is only almost *nil*, if program execution time doesn't depend 
on opcode dispatch time. E.g. mops.pasm has ~50% execution time in 
cg_core (i.e. the computed goto core). Running the normal fast_core 
slows this down by ~30%.

This might or might not be true for RL applications, but I hope, that 
the optimizer will bring us near above relations for average programs.

Nethertheless I see the need for dynamic oplibs. If e.g. a program pulls 
in obsure.ops, it could as well pay the penalty for using these.


> Regards,
> 
> -- Gregor


leo




Reply via email to