Re: Improved storage-to-storage architecture performance

Ken Fox Tue, 30 Oct 2001 11:06:32 -0800

Dan Sugalski wrote:
> Hmmm. I'd like to see the two run with the same style dispatcher to get a 
> closer comparison of run speeds. When you say threaded, you're doing more 
> or less what the switch/computed-goto dispatcher does, right?

If you take a look at the op_add code I posted, you'll see that it
uses gcc's "goto *address" feature. What happens is the bytecode is
pre-processed and the addresses of the ops are stashed directly
into the byte code stream. There's nothing to compute -- the system
just loads the address and jumps.

I'd really like to try different dispatchers out with Parrot too. That's
why I asked if anybody did a threaded dispatcher yet. (If nobody has,
then I'll do one...)

> >Parrot and Kakapo should have very similar mops when using the
> >same dispatcher.
> 
> In which case I'm not sure I see the win, though I'm definitely curious as 
> to how it stacks up. I think the end result will be essentially the same as 
> far as performance goes, but I'm not sure yet.

The win is in simplicity -- both the VM and the compiler. Register
VMs require the compiler to load things into registers, right? If the
register allocation is good it will be fast. Poor register allocation
will waste time with redundant loads, excessive register spills, etc.

> Ken Fox wrote:
> > What happens when you "goto middle" depends on where you started.
> 
> Personally I'm all for throwing a fatal error, but that's just me.

:)

> If you're copying things around that means you have to do a bunch of 
> pointer fixups too, otherwise you'll have code pointing to the wrong place.

Nope. The byte code holds pointers to scope definitions (think of them
like templates) and op addresses. Everything else is a [frame:offset]
pair. Entire frames can be moved around without disrupting any code. The
callee can even move around its caller's frames and nothing breaks. (This
is how taking continuations works.)

> If you're not storing pointers to things in frames, then I don't see the 
> advantage to this scheme, since you're indirect anyway, which is what we 
> are now.

Indirection is absolutely required in both our schemes. The difference
is that a register architecture copies data into a register whereas
a storage-to-storage architecture works on the data in place.

You can think of Kakapo as having a single address mode for
everything. Constants, globals, locals, temporaries, etc. are all
identified by [frame:offset]. (In-lined constants are an exception
to this rule, but that's just a performance hack.)

> Smart compilers aren't that tough to build any more--there's a lot of 
> literature for them these days, so it's not much more work to build a
> smart one than it is to build a dumb one.

Sure, agree. Smart compilers can take some time to run though. For
example, IMHO a compiler has to hoist register loads out of loops to
get decent performance. To do that it's going to have to analyze when
the local variable will change -- a pretty expensive optimization.

Maybe it won't matter because PMCs all introduce an extra layer of
indirection anyway. I have no idea whether most things will be
PMCs or if compilers will try to use strongly typed registers for
speed.

- Ken

Re: Improved storage-to-storage architecture performance

Reply via email to