.NET CLR and Parrot

Simon Cozens Sat, 23 Feb 2002 03:33:58 -0800

I was very lucky recently to attend a talk by Ganesh Sittampalam
introducing Microsoft .NET and the Common Language Runtime. A lot of
what CLR is trying to do is quite similar to what we're doing with
Parrot, so I thought it would be a good idea to briefly recap what's
going on with CLR.


The CLR is, essentially, a stack based virtual machine optimized for
JITting. The design and specification of the CLR is a published
standard, available from ECMA at http://www.ecma.ch/. (ECMA are the body
responsible for developing the JavaScript - hence now called ECMAScript
- standard.) The CLR is also sometimes called the Common Language
Infrastructure, CLI, although strictly I think the CLI refers to the CLR
VM itself plus the runtime libraries to go along with it.

Naturally, Microsoft have only standardized a functional subset of the
libraries they're implementing; MS .NET has classes for Windows Forms,
for instance, whereas ECMA CLR does not. Similarly, when Microsoft
ported the CLR to FreeBSD (the so-called Rotor project) it implemented
the same stripped-down version of the standard.

The CLR runs a bytecode language, which Microsoft call MS-IL when
they're talking about their implementation of CLR, and what they call
CIL (Common Intermediate Language) to ECMA. It's object-oriented
assembler, a true horror to behold, but it works.

IL is not interpreted; it is JIT compiled. This was probably to avoid
the bad reputation Java got for being slow when it was interpreted and
nobody was producing or using Java compilers. There's nothing to stop it
being interpreted; mint, from the Mono project, is a CLR interpreter.

In order to optimize CLR for JITting, it imposes a number of
restrictions on the IL. For instance, the stack may only be used to
store parameters and return values from operations and calls; you can't
access arbitrary points in the stack; more significantly, the types of
values on the stack have to be statically determinable and invariant.
That's to say, at a given call in the code, you know for sure what types
of things are on the stack at the time.

The types themselves are part of the Common Type System, something every
language compiling to .NET has to conform to. CTS types come in two, uh,
sorts: value types and reference types. Value types are things like
INTVALs and NUMVALs, reference types are anything else which would
normally involve a pointer in C - arrays, STRINGs, PMCs. (references to
objects - a reference is like its C++ or Perl namesake.)

There's a smaller subset of CTS called the Common Language
Specification, CLS. Languages *must* implement CLS types, and may
implement their own types as part of the CTS. The CLS ought to be used
in all "outward-facing" APIs where two different languages might meet;
the idea being the data passed between two languages is guaranteed to
have a known meaning and semantics. However, this API restriction is not
enforced by the VM.

Types which can appear on the stack are restricted again; you're allowed
int32, int64, int, float, a reference, a "managed" pointer or an
unmanaged pointer. "Management" is determined by where the pointer comes
from (trusted code is managed) and influences what it's allowed to see
and how it gets GCed. Local arguments may live somewhere other than on
the main stack - this is implementation-defined - in which case they
have access to a richer set of types; but since you're got a reference
to an object, you should be OK.

Other value types include structures and enumerations. Since value types
are passed around on the stack, you can't really have big structures,
since you'd be passing loads of data. There's also the typed reference,
which is a reference plus something storing what sort of reference it
is. Reference types are kept in the heap, managed by garbage collection,
and are referenced on the stack. This is not unlike what we're doing
with PMC and non-PMC registers.

You can convert things between value types and reference types, since
each value type has an associated reference type. (An alternative way to
look at this is that value types *are* an object of some reference
type.) For instance, in C#:

    int i = 125;
    Object int_o = i;

This gives you an Object of type System.Int32, and you can now call
methods on it, etc. This operation - turning a value into an object - is
known as "boxing", and, since reference types live on the heap, can be
used to extend the lifetime of a value beyond its "natural" life on the
stack. Even user-defined structs can be boxed.

What about IL operations? There are surprisingly few. You can load/store
constants, local variables, arguments, fields and array elements; you
can create and dereference pointers; you can do arithmetic; you can do
conversion, casting and truncating; there are branch ops (including a
built-in lookup-table switch op) and method call ops;  there's a special
tail-recursion method-call op; you can throw and handle exceptions; you
can box and unbox; you can create an array and find its length; you can
handle typed references. And that's it. Anything else is outside the
realm of the CLR, and has to be implemented with external methods.

PE files are .NET's executables. They're made up of assemblies and
metadata; assemblies are collections of classes, much like Java's JAR
files, they can be distributed separately. (as DLLs, in the Windows
world) Assemblies are stamped with a name, version, md5sum and
signature. (A "strong name")

Metadata, on the other hand, comprises type information about classes
and methods, information about obsolescence, etc. You can provide other
metadata for your own use - for instance, if you're implementing your
own JIT, you can provide hints. Metadata is available for introspection
at runtime.

Classes in assemblies are much like Java classes; they have fields,
methods, constructors, and so on. They also have nested types, or inner
classes. Additionally, there are properties, which are fields with
get/set methods: 

    private int MyInt;
    public int SomeInt {
        get {
            Console.WriteLine("I was got.\n");
            return MyInt;
        }
        set {
            Console.WriteLine("I was set.\n");
            MyInt = value;
        }
    }

CLR doesn't currently support closures natively, although there's a
research time working on it and it may be introduced to the spec in
the future. 

For more information, http://www.gotdotnet.com/ and
http://www.albahari.com/ are pretty good locations.
-- 
People in a Position to Know, Inc.

.NET CLR and Parrot

Reply via email to