Leo -- Ah. It seems the point of divergence is slow_core vs. cg_core, et al.
As you have figured out, I've been referring to performance of the non-cg, non-prederef, non-JIT (read: "slow" ;) core. I don't know much about the CG core, but prederef and JIT should be able to work with dynamic optables. For prederef and JIT, optable mucking does expire your prederefed and JITted blocks (in general), but for conventional use (preamble setup), you don't pay a price during mainline execution once you've set up your optable. You only pay an additional cost if your program is dynamic enough to muck with its optable in the middle somewhere, so you have to pay to re-prederef or re-JIT stuff (and a use tax like that seems appropriate to me). Of all the cores, the CG core is the most "crystalized" (rigid), so it stands to reason that it would not be a good match for dynamic optables. While I don't think I'm sophisticated enough to pull it off on my own, I do think it should be possible to use what was learned to build the JIT system to build the equivalent of a CG core on the fly, given its structure. I think the information and basic capabilities are already there: The JIT system knows already how to compile a sequence of ops to machine code -- using this plus enough know-how to plop in the right JMP instructions pretty much gets you there. A possible limitation to the coolness, here: I think the JIT system bails out for the non-inline ops and just calls the opfunc (please forgive if my understanding of what JIT does and doesn't do is out of date). I think the CG core doesn't have to take the hit of that extra indirection for non-inline ops. If so, then the hypothetical dynamic core construction approach just described would approach the speed of the CG core, but would fall somewhat short on workloads that involve lots of non-inline ops (FWIW, there are more inline ops than not in the current *.ops files). Then, you get CG (-esque) speed along with the dynamic capabilities. Its cheating, to be sure, but I like that kind of cheating. :) Further, DCC would work with dynamically loaded oplibs (presumably using purely the JIT-func-call technique, although I suppose its possible to do even better), where the CG core would not. It would be interesting to see where DCC would fit on the performance spectrum compared to JIT, for mops.pasm and for other examples with broader op usage... Regards, -- Gregor Leopold Toetsch <[EMAIL PROTECTED]> 11/04/2002 08:45 AM To: [EMAIL PROTECTED] cc: Brent Dax <[EMAIL PROTECTED]>, "'Andy Dougherty'" <[EMAIL PROTECTED]>, Josh Wilmes <[EMAIL PROTECTED]>, "'Perl6 Internals'" <[EMAIL PROTECTED]> Subject: Re: Need for fingerprinting? [was: Re: What to do if Digest::MD5 is unavailable?] [EMAIL PROTECTED] wrote: > Leo -- > > ... Optable build time is not a function of program > size, but rather of optable size Ok, I see that, but ... > I don't think it remains a problem how to run ops from different oplibs > _fast_. .... the problem is, that as soon as there are dynamic oblibs, they can't be run in the CGoto core, which is normally the fastest core, when executions time is depending on opcode dispatch time. JIT is (much) faster, in almost integer only code, e.g. mops.pasm, but for more complex programs, involving PMCs, JIT is currently slower. > ... Op lookup is already fast ... I rewrote find_op, to build a lookup hash at runtime, when it's needed. This is 2-3 times faster then the find_op with the static lookup table in the core_ops.c file. > ... After the > preamble, while the program is running, the cost of having a dynamic > optable is absolutely *nil*, whether the ops in question were statically > or dynamically loaded (if you don't see that, then either I'm very wrong, > or I haven't given you the right mental picture of what I'm talking > about). The cost is only almost *nil*, if program execution time doesn't depend on opcode dispatch time. E.g. mops.pasm has ~50% execution time in cg_core (i.e. the computed goto core). Running the normal fast_core slows this down by ~30%. This might or might not be true for RL applications, but I hope, that the optimizer will bring us near above relations for average programs. Nethertheless I see the need for dynamic oplibs. If e.g. a program pulls in obsure.ops, it could as well pay the penalty for using these. > Regards, > > -- Gregor leo