Re: What is wrong with GCC's register transfer language?

Simon Cozens Mon, 03 Dec 2001 09:36:00 -0800

On Mon, Dec 03, 2001 at 10:31:54AM -0700, Nathan Torkington wrote:
> I think it's time to collet these questions into a FAQ.


Agreed.

> Any volunteers?

No way, but they can have the attached to play with.

-- 
"Everything's working, try again in half an hour."-chorus.net tech
support

<!DOCTYPE ARTICLE PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
<article>
  <title>Parrot : A Cross-Language Virtual Machine Architecture</title>
  <artheader>
    <authorgroup>
      <author><firstname>Simon</firstname> <surname>Cozens</surname></author>
  </authorgroup>      
    <revhistory>
      <revision>
        <revnumber>0.1</revnumber> 
        <date>08 August 2001</date>
      </revision>
    </revhistory>
  </artheader>
  <abstract>
    <para>
      This paper describes the technical and social aims for the
      proposed Parrot shared-bytecode interpreter. It also explains
      the difference between Parrot, mono and Microsoft's .NET CLR,
      and details some technical information about how Parrot is
      likely to be implemented and the rationale behind these
      decisions.
    </para>
  </abstract>
  <sect1>
    <title>What is Parrot?</title>
    <para>
      At the Perl Conference 2000, Larry Wall announced a new
      direction for Perl; both the language and the interpreter were
      to be redesigned and reworked from scratch. Dan Sugalski took on
      the role of heading up the implementation, and one of his design
      goals for the new interpreter - nicknamed Parrot<footnote>
        <para>After Simon Cozens' April Fools Joke in which Larry and
        Guido van Rossum announced that the Perl and Python languages
        and interpreters were to merge.</para>
      </footnote> - was that it should support the interpretation of
      bytecode-compiled languages other than Perl.
    </para>
    <para>
      At the same time as this was going on, considerable interest was
      being generated in the programming community as a whole and in
      the open source community in particular concerning Microsoft's
      .NET architecture and Common Language Runtime, culminating in
      Ximian's "Mono" project, an open source implementation of a C#
      compiler and runtime. (We will examine Parrot's relationship to
      .NET and to Mono in later sections.)
    </para>
    <para>
      The grass-roots desire for a common interpreter became further
      apparent when Simon set up the <literal>language-dev</literal>
      mailing list to facilitate discussion between implementors of
      various dynamic programming languages, such as Perl, Python,
      Ruby and Tcl. (amongst others)
    </para>
    <para>
      Given that Perl 6 needed this new interpreter, Eric Raymond
      suggested to the <literal>python-dev</literal> mailing list that
      Python developers get involved in assisting the specification of
      the interpreter to ensure that it would be able to support their
      needs in interpreting Python bytecode.
    </para>
    <para>
      Parrot is, then, a project to create an interpreter which is
      capable of efficiently executing intermediate bytecode emitted
      by a range of dynamic languages. 
    </para>
  </sect1>
  <sect1>
    <title>How does this relate to .NET?</title>
    <para>
      .NET - or more specifically, the .NET Common Language Runtime
      (<firstterm>CLR</firstterm>) - also claims to be a platform
      capable of efficiently executing bytecode compiled by a range of
      languages. However, the specifications for the CLR restrict its
      utility for handling languages which are not as strongly typed
      as C#; while it is possible to compile Perl and Python to CLR
      bytecode, it ends up being rather slow.
    </para>
    <para>
      While it is possible to refine the process by which Perl or
      Python is compiled down to CLR, this loses sight of the big
      picture - there is an impedence mismatch between the static
      nature of the CLR's environment and the dynamic nature of the
      source languages. The mismatch may be overcome by clever
      programming, but it is more fruitful to design an interpreter
      which lacks this mismatch - that is, one designed from the
      beginning to support multiple dynamic languages.
    </para>
    <para>
      Looking at it from another point of view, CLR and Parrot are not
      rival technologies but parallel ones - Parrot does for
      dynamic languages what the CLR does for static ones.
    </para>
  </sect1>
  <sect1>
    <title>How does this relate to Mono?</title>
    <para>
      The Mono is an open source implementation of the standards
      submitted by Microsoft to the ECMA detailing the CLR. It is not
      expected to deviate from the standards sufficiently to remove
      the impedence mismatched explained above.
    </para>
    <para>
      Furthermore, Microsoft have expressed their willingness to use
      the patents that they hold on the technologies behind CLR (even
      though these technologies have been sent to ECMA for
      standardisation) in order to regulate and control competing
      implementations. We are not prepared to run the risk of having
      our interpreter hijacked in this way.
    </para>
  </sect1>
  <sect1>
    <title>Why redesign a VM from scratch?</title>
    <para>
      We are obviously aware that there are a large number of virtual
      machines in existence, and we have spent time and will spend
      more time examining the design and implementation of these VMs.
    </para>
    <para>
      However, we should not forget that Parrot does have its roots in
      the Perl 6 project and it will end up becoming the default Perl
      6 interpreter. Hence, its primary focus is to run Perl 6
      quickly, and it will need the operations and capabilities that
      the Perl 6 language requires. There is currently no VM which is
      adequately specialised to Perl 6; given that any existing VM
      would not map directly to Perl 6's behaviour, we have an
      impedance mismatch similar to the above one which led us to
      reject Microsoft's CLR. This means we have no choice but to
      create our own VM from scratch.
    </para>
    <para>
      It is extremely important to note that, although Parrot will be
      designed to have the capabilities to run Perl 6 quickly, this
      does not necessarily have to be at the expense of efficiency in
      interpreting Parrot bytecode from other languages. It is our
      desire that Perl and Python - and any other languages which wish
      to use Parrot's capabilities - will be considered equally when
      designing the Parrot interpreter. The technical details below
      will explain how we plan to keep the interpreter generic while
      ensuring that it will run each of its source languages as
      efficiently as possible.
    </para>
  </sect1>
  <sect1>
    <title>Technical details</title>
    <para>
      By way of introducing the technical details, it is worth
      pointing out that everything here is subject to change. This
      document will be updated to reflect the current thinking behind
      Parrot, but it is not to be taken as a completed specification.
    </para>
    <sect2>
      <title>Overview</title>
      <para>
        The Python interpreter is, like all virtual machines, a
        software CPU. However, unlike many software CPUs, the Parrot
        interpreter will more closely mirror hardware - primarily, in
        that it shall be register-based rather than stack-based. By
        resembling the hardware CPU design, it should be easier to
        create code which can be compiled down to efficient machine
        language. It also allows us to use standard compiler
        optimization techniques which are targeted for assembly
        languages; there is far more literature available on
        optimizing code for low-level, register-based CPUs than for
        stack-based VMs with macro-ops.
      </para>
      <para>
        The CPU will have a large instruction set and a large set of
        registers; 64 registers, with support for 32-bit or larger word sizes. 
    Some types which are required to hold pointers will be the native machine's pointer
    size.
      </para>
    </sect2>
    <sect2>
      <title>Data Objects and Vtables</title>
      <para>
        The way Parrot will resolve the differences between languages
        is to push off many of the operations onto the structures
        which represent pieces of data inside the interpreter. That is
        to say, the main loop of the Python interpreter will tell a
        variable to increment itself; the type of the variable will
        determine how that incrementation is done.
      </para>
      <para>
        This will be achieved by a system of
        <firstterm>vtables</firstterm> attached to each piece of
        data. Each piece of data will act like an object, and vtables,
        which are structures of function pointers, represent the
        methods the object can call. (Here it becomes important not to
        confuse the object-like behaviour of the represented data with
        the object system of the source language. When we talk about
        calling methods on objects, we are referring to performing
        operations on pieces of data as low-level as, say, an
        integer. We do not care how the source language organises its
        object orientation.)
      </para>
      <para>
        For instance, when a piece of data is told to increment
        itself, it will locate the "increment" function pointer in its
        vtable, and call the function on itself. This allows us to
        keep the semantics of an operation separate between languages;
        different languages will beget objects with different vtables.
      </para>
      <para>
        In a sense, this is not dissimilar to the way the Python VM is
        currently implemented; Python also allows for new types to be
        implemented, with differing behaviour, simply by defining new
        methods to go in the new type's vtable. However, Parrot's data
        objects will differ from Python's in subtle ways - for
        instance, they will have the ability to transform themselves
        to a different type; they will not have a separate type
        object, but directly contain a pointer to their vtable
        methods.
      </para>
    </sect2>
    <sect2>
      <title>Bytecode layout</title>
      <para>
        The Parrot bytecode will be organised on disk as three
        distinct segments:
      </para>
      <variablelist>
        <varlistentry>
          <term>Fixup Segment</term>
          <listitem>
            <para>
              Read-Write. This has all the real-address pointers
              stored in it, allowing us to generate
              position-independent code. When bytecode is loaded, it
              will remap the position-independent pointers to real
              address pointers.
            </para>
          </listitem>
        </varlistentry>
        <varlistentry>
          <term>Constants Segment</term>
          <listitem>
            <para>
              Read-Only. Like Python bytecode's constant section, this
              holds objects representing string and integer
              constants. The loader (or possibly the optimizer) will
              alter the fixup section to remap constant accesses to
              point within this section.
            </para>
          </listitem>
        </varlistentry>
        <varlistentry>
          <term>Instruction Segment</term>
          <listitem>
            <para>
              Read-Only. This holds the bytecode representing the
              operations to be performed. This is position-independent
              code; references are either to relative positions
              within the instruction segment, to pointers in the fixup
              segment, or symbolic references to labels.
            </para>
          </listitem>
        </varlistentry>
      </variablelist>
      <para>
        It is possible that there will be a source segment to aid
        debugging; this may be overlapped with the constants segment
        so that large string constants inside source code will not be
        duplicated in the bytecode. The source segment may be stripped
        by an optimizer. 
      </para>
    </sect2>
    <sect2>
      <title>Opcodes</title>
      <para>
        Parrot's opcodes will be 32 bits wide so that all internal
        types will be the same width; this also avoids alignment
        problems inside processors, and simplifies the conversion of
        Parrot bytecode to differently-endian machines.
      </para>
      <para>
        Parrot will allow the creation of user-defined operations; a
        portion of the opcode table will be reserved for builtins,
        with the rest available for user-defined
        ops. It is hoped that subroutines will compile down to
        user-defined ops, and that C functions from extension modules
        will be implemented as user-defined ops. Within the bytecode,
        these user-defined ops will be lexically scoped; each lexical
        scope will define an op table mapping operations above the
        built-in watermark to relative pointers in the fix-up section.
      </para>
      <para>
        Opcodes may be overridden; Parrot will guarantee that
        overridable ops will always be looked up in the op table
        before dispatch, whereas non-overridable ops may be dispatched
        directly. 
      </para>
    </sect2>
    <sect2>
      <title>Garbage Collection</title>
      <para></para>
    </sect2>
    <sect2>
      <title>IO Subsystem</title>
      <para>
        Parrot will contain its own generic standard IO subsystem,
        modelled to some degree on Perl 5.6's
        <literal>perlio</literal> system. This allows full control
        over the buffering of filehandles and the ability to add
        arbitrary processing modules between the file and the
        filehandle, much like <literal>sfio</literal>'s line
        discipline model.
      </para>
      <para>
        Where possible, Parrot's IO system will run in a separate
        thread, to allow IO to be asynchronous. 
    </sect2>
  </sect1>
</article>

Re: What is wrong with GCC's register transfer language?

Reply via email to