On Thu, Aug 30, 2001 at 01:40:15PM -0400, Sam Tregar wrote:
> What I've seen so far resembles a software CPU, which be analogy should be
> able to do anything from run Quake III to compile Java. Does that mean
> it's well-suited to compiling Perl?
The interpreter is highly unsuited to *compiling* Perl. Compiling Perl
is a hard task which we can't do until Larry gives us the language spec.
But even if we do have a language spec, it's extremely difficuilt to
compile to a target that does not exist. So the priority for us now is
to make that target exist. Hence the need to start coding the
interpreter.
While it's not intended to be good at compilation, the interpreter is
going to behighly suited to running *any* kind of bytecode. There are a
bunch of reasons why I know it'll be flexible enough to run anything
Larry throws at us. For starters, we're taking all the techniques from
all the other interpreters we can find and working out both the
intersection and the union of them. We know what an interpreter needs to
do, and we know what a lot of languages need from their interpreter.
Secondly, I know the interpreter will be able to run anything Larry
throws at it for the paradoxical reason that the interpreter will know
*nothing* about what Larry's likely to throw at it. Does a real CPU
know whether it's running C or Java? Hell no. It runs whatever's in its
native machine code.
I could spend a lot of time justifying it to you here and now, or I
could spend the same time writing a detailed specification of the
interpreter interface. I think, to be honest, it might be more
productive for me to take the second option. However, I don't want you
to think I'm brushing your concerns aside; here's a quick sketch of what
I'm currently thinking.
The way Parrot will resolve the differences between languages
is to push off many of the operations onto the structures
which represent pieces of data inside the interpreter. That is
to say, the main loop of the Parrot interpreter will tell a
variable to increment itself; the type of the variable will
determine how that incrementation is done.
This will be achieved by a system of vtables attached to each
piece of data. Each piece of data will act like an object, and
vtables, which are structures of function pointers, represent the
methods the object can call. (Here it becomes important not to
confuse the object-like behaviour of the represented data with the
object system of the source language. When we talk about calling
methods on objects, we are referring to performing operations on
pieces of data as low-level as, say, an integer. We do not care
how the source language organises its object orientation.)
For instance, when a piece of data is told to increment
itself, it will locate the "increment" function pointer in its
vtable, and call the function on itself. This allows us to
keep the semantics of an operation separate between languages;
different languages will beget objects with different vtables.
In a sense, this is not dissimilar to the way the Python VM is
currently implemented; Python also allows for new types to be
implemented, with differing behaviour, simply by defining new
methods to go in the new type's vtable. However, Parrot's data
objects will differ from Python's in subtle ways - for
instance, they will have the ability to transform themselves
to a different type; they will not have a separate type
object, but directly contain a pointer to their vtable
methods.
As for opcodes, Parrot will allow the creation of user-defined
operations; a portion of the opcode table will be reserved for
builtins, with the rest available for user-defined ops. It is
hoped that subroutines will compile down to user-defined ops, and
that C functions from extension modules will be implemented as
user-defined ops. Within the bytecode, these user-defined ops will
be lexically scoped; each lexical scope will define an op table
mapping operations above the built-in watermark to relative
pointers in the fix-up section.
Opcodes may be overridden; Parrot will guarantee that overridable
ops will always be looked up in the op table before dispatch,
whereas non-overridable ops may be dispatched directly.
I hope this is enough to whet your appetite. There is more where that
came from, if I can be allowed time to finish. :)
Simon