On Mon, Dec 03, 2001 at 10:31:54AM -0700, Nathan Torkington wrote: > I think it's time to collet these questions into a FAQ.
Agreed. > Any volunteers? No way, but they can have the attached to play with. -- "Everything's working, try again in half an hour."-chorus.net tech support
<!DOCTYPE ARTICLE PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <article> <title>Parrot : A Cross-Language Virtual Machine Architecture</title> <artheader> <authorgroup> <author><firstname>Simon</firstname> <surname>Cozens</surname></author> </authorgroup> <revhistory> <revision> <revnumber>0.1</revnumber> <date>08 August 2001</date> </revision> </revhistory> </artheader> <abstract> <para> This paper describes the technical and social aims for the proposed Parrot shared-bytecode interpreter. It also explains the difference between Parrot, mono and Microsoft's .NET CLR, and details some technical information about how Parrot is likely to be implemented and the rationale behind these decisions. </para> </abstract> <sect1> <title>What is Parrot?</title> <para> At the Perl Conference 2000, Larry Wall announced a new direction for Perl; both the language and the interpreter were to be redesigned and reworked from scratch. Dan Sugalski took on the role of heading up the implementation, and one of his design goals for the new interpreter - nicknamed Parrot<footnote> <para>After Simon Cozens' April Fools Joke in which Larry and Guido van Rossum announced that the Perl and Python languages and interpreters were to merge.</para> </footnote> - was that it should support the interpretation of bytecode-compiled languages other than Perl. </para> <para> At the same time as this was going on, considerable interest was being generated in the programming community as a whole and in the open source community in particular concerning Microsoft's .NET architecture and Common Language Runtime, culminating in Ximian's "Mono" project, an open source implementation of a C# compiler and runtime. (We will examine Parrot's relationship to .NET and to Mono in later sections.) </para> <para> The grass-roots desire for a common interpreter became further apparent when Simon set up the <literal>language-dev</literal> mailing list to facilitate discussion between implementors of various dynamic programming languages, such as Perl, Python, Ruby and Tcl. (amongst others) </para> <para> Given that Perl 6 needed this new interpreter, Eric Raymond suggested to the <literal>python-dev</literal> mailing list that Python developers get involved in assisting the specification of the interpreter to ensure that it would be able to support their needs in interpreting Python bytecode. </para> <para> Parrot is, then, a project to create an interpreter which is capable of efficiently executing intermediate bytecode emitted by a range of dynamic languages. </para> </sect1> <sect1> <title>How does this relate to .NET?</title> <para> .NET - or more specifically, the .NET Common Language Runtime (<firstterm>CLR</firstterm>) - also claims to be a platform capable of efficiently executing bytecode compiled by a range of languages. However, the specifications for the CLR restrict its utility for handling languages which are not as strongly typed as C#; while it is possible to compile Perl and Python to CLR bytecode, it ends up being rather slow. </para> <para> While it is possible to refine the process by which Perl or Python is compiled down to CLR, this loses sight of the big picture - there is an impedence mismatch between the static nature of the CLR's environment and the dynamic nature of the source languages. The mismatch may be overcome by clever programming, but it is more fruitful to design an interpreter which lacks this mismatch - that is, one designed from the beginning to support multiple dynamic languages. </para> <para> Looking at it from another point of view, CLR and Parrot are not rival technologies but parallel ones - Parrot does for dynamic languages what the CLR does for static ones. </para> </sect1> <sect1> <title>How does this relate to Mono?</title> <para> The Mono is an open source implementation of the standards submitted by Microsoft to the ECMA detailing the CLR. It is not expected to deviate from the standards sufficiently to remove the impedence mismatched explained above. </para> <para> Furthermore, Microsoft have expressed their willingness to use the patents that they hold on the technologies behind CLR (even though these technologies have been sent to ECMA for standardisation) in order to regulate and control competing implementations. We are not prepared to run the risk of having our interpreter hijacked in this way. </para> </sect1> <sect1> <title>Why redesign a VM from scratch?</title> <para> We are obviously aware that there are a large number of virtual machines in existence, and we have spent time and will spend more time examining the design and implementation of these VMs. </para> <para> However, we should not forget that Parrot does have its roots in the Perl 6 project and it will end up becoming the default Perl 6 interpreter. Hence, its primary focus is to run Perl 6 quickly, and it will need the operations and capabilities that the Perl 6 language requires. There is currently no VM which is adequately specialised to Perl 6; given that any existing VM would not map directly to Perl 6's behaviour, we have an impedance mismatch similar to the above one which led us to reject Microsoft's CLR. This means we have no choice but to create our own VM from scratch. </para> <para> It is extremely important to note that, although Parrot will be designed to have the capabilities to run Perl 6 quickly, this does not necessarily have to be at the expense of efficiency in interpreting Parrot bytecode from other languages. It is our desire that Perl and Python - and any other languages which wish to use Parrot's capabilities - will be considered equally when designing the Parrot interpreter. The technical details below will explain how we plan to keep the interpreter generic while ensuring that it will run each of its source languages as efficiently as possible. </para> </sect1> <sect1> <title>Technical details</title> <para> By way of introducing the technical details, it is worth pointing out that everything here is subject to change. This document will be updated to reflect the current thinking behind Parrot, but it is not to be taken as a completed specification. </para> <sect2> <title>Overview</title> <para> The Python interpreter is, like all virtual machines, a software CPU. However, unlike many software CPUs, the Parrot interpreter will more closely mirror hardware - primarily, in that it shall be register-based rather than stack-based. By resembling the hardware CPU design, it should be easier to create code which can be compiled down to efficient machine language. It also allows us to use standard compiler optimization techniques which are targeted for assembly languages; there is far more literature available on optimizing code for low-level, register-based CPUs than for stack-based VMs with macro-ops. </para> <para> The CPU will have a large instruction set and a large set of registers; 64 registers, with support for 32-bit or larger word sizes. Some types which are required to hold pointers will be the native machine's pointer size. </para> </sect2> <sect2> <title>Data Objects and Vtables</title> <para> The way Parrot will resolve the differences between languages is to push off many of the operations onto the structures which represent pieces of data inside the interpreter. That is to say, the main loop of the Python interpreter will tell a variable to increment itself; the type of the variable will determine how that incrementation is done. </para> <para> This will be achieved by a system of <firstterm>vtables</firstterm> attached to each piece of data. Each piece of data will act like an object, and vtables, which are structures of function pointers, represent the methods the object can call. (Here it becomes important not to confuse the object-like behaviour of the represented data with the object system of the source language. When we talk about calling methods on objects, we are referring to performing operations on pieces of data as low-level as, say, an integer. We do not care how the source language organises its object orientation.) </para> <para> For instance, when a piece of data is told to increment itself, it will locate the "increment" function pointer in its vtable, and call the function on itself. This allows us to keep the semantics of an operation separate between languages; different languages will beget objects with different vtables. </para> <para> In a sense, this is not dissimilar to the way the Python VM is currently implemented; Python also allows for new types to be implemented, with differing behaviour, simply by defining new methods to go in the new type's vtable. However, Parrot's data objects will differ from Python's in subtle ways - for instance, they will have the ability to transform themselves to a different type; they will not have a separate type object, but directly contain a pointer to their vtable methods. </para> </sect2> <sect2> <title>Bytecode layout</title> <para> The Parrot bytecode will be organised on disk as three distinct segments: </para> <variablelist> <varlistentry> <term>Fixup Segment</term> <listitem> <para> Read-Write. This has all the real-address pointers stored in it, allowing us to generate position-independent code. When bytecode is loaded, it will remap the position-independent pointers to real address pointers. </para> </listitem> </varlistentry> <varlistentry> <term>Constants Segment</term> <listitem> <para> Read-Only. Like Python bytecode's constant section, this holds objects representing string and integer constants. The loader (or possibly the optimizer) will alter the fixup section to remap constant accesses to point within this section. </para> </listitem> </varlistentry> <varlistentry> <term>Instruction Segment</term> <listitem> <para> Read-Only. This holds the bytecode representing the operations to be performed. This is position-independent code; references are either to relative positions within the instruction segment, to pointers in the fixup segment, or symbolic references to labels. </para> </listitem> </varlistentry> </variablelist> <para> It is possible that there will be a source segment to aid debugging; this may be overlapped with the constants segment so that large string constants inside source code will not be duplicated in the bytecode. The source segment may be stripped by an optimizer. </para> </sect2> <sect2> <title>Opcodes</title> <para> Parrot's opcodes will be 32 bits wide so that all internal types will be the same width; this also avoids alignment problems inside processors, and simplifies the conversion of Parrot bytecode to differently-endian machines. </para> <para> Parrot will allow the creation of user-defined operations; a portion of the opcode table will be reserved for builtins, with the rest available for user-defined ops. It is hoped that subroutines will compile down to user-defined ops, and that C functions from extension modules will be implemented as user-defined ops. Within the bytecode, these user-defined ops will be lexically scoped; each lexical scope will define an op table mapping operations above the built-in watermark to relative pointers in the fix-up section. </para> <para> Opcodes may be overridden; Parrot will guarantee that overridable ops will always be looked up in the op table before dispatch, whereas non-overridable ops may be dispatched directly. </para> </sect2> <sect2> <title>Garbage Collection</title> <para></para> </sect2> <sect2> <title>IO Subsystem</title> <para> Parrot will contain its own generic standard IO subsystem, modelled to some degree on Perl 5.6's <literal>perlio</literal> system. This allows full control over the buffering of filehandles and the ability to add arbitrary processing modules between the file and the filehandle, much like <literal>sfio</literal>'s line discipline model. </para> <para> Where possible, Parrot's IO system will run in a separate thread, to allow IO to be asynchronous. </sect2> </sect1> </article>