I was very lucky recently to attend a talk by Ganesh Sittampalam introducing Microsoft .NET and the Common Language Runtime. A lot of what CLR is trying to do is quite similar to what we're doing with Parrot, so I thought it would be a good idea to briefly recap what's going on with CLR.
The CLR is, essentially, a stack based virtual machine optimized for JITting. The design and specification of the CLR is a published standard, available from ECMA at http://www.ecma.ch/. (ECMA are the body responsible for developing the JavaScript - hence now called ECMAScript - standard.) The CLR is also sometimes called the Common Language Infrastructure, CLI, although strictly I think the CLI refers to the CLR VM itself plus the runtime libraries to go along with it. Naturally, Microsoft have only standardized a functional subset of the libraries they're implementing; MS .NET has classes for Windows Forms, for instance, whereas ECMA CLR does not. Similarly, when Microsoft ported the CLR to FreeBSD (the so-called Rotor project) it implemented the same stripped-down version of the standard. The CLR runs a bytecode language, which Microsoft call MS-IL when they're talking about their implementation of CLR, and what they call CIL (Common Intermediate Language) to ECMA. It's object-oriented assembler, a true horror to behold, but it works. IL is not interpreted; it is JIT compiled. This was probably to avoid the bad reputation Java got for being slow when it was interpreted and nobody was producing or using Java compilers. There's nothing to stop it being interpreted; mint, from the Mono project, is a CLR interpreter. In order to optimize CLR for JITting, it imposes a number of restrictions on the IL. For instance, the stack may only be used to store parameters and return values from operations and calls; you can't access arbitrary points in the stack; more significantly, the types of values on the stack have to be statically determinable and invariant. That's to say, at a given call in the code, you know for sure what types of things are on the stack at the time. The types themselves are part of the Common Type System, something every language compiling to .NET has to conform to. CTS types come in two, uh, sorts: value types and reference types. Value types are things like INTVALs and NUMVALs, reference types are anything else which would normally involve a pointer in C - arrays, STRINGs, PMCs. (references to objects - a reference is like its C++ or Perl namesake.) There's a smaller subset of CTS called the Common Language Specification, CLS. Languages *must* implement CLS types, and may implement their own types as part of the CTS. The CLS ought to be used in all "outward-facing" APIs where two different languages might meet; the idea being the data passed between two languages is guaranteed to have a known meaning and semantics. However, this API restriction is not enforced by the VM. Types which can appear on the stack are restricted again; you're allowed int32, int64, int, float, a reference, a "managed" pointer or an unmanaged pointer. "Management" is determined by where the pointer comes from (trusted code is managed) and influences what it's allowed to see and how it gets GCed. Local arguments may live somewhere other than on the main stack - this is implementation-defined - in which case they have access to a richer set of types; but since you're got a reference to an object, you should be OK. Other value types include structures and enumerations. Since value types are passed around on the stack, you can't really have big structures, since you'd be passing loads of data. There's also the typed reference, which is a reference plus something storing what sort of reference it is. Reference types are kept in the heap, managed by garbage collection, and are referenced on the stack. This is not unlike what we're doing with PMC and non-PMC registers. You can convert things between value types and reference types, since each value type has an associated reference type. (An alternative way to look at this is that value types *are* an object of some reference type.) For instance, in C#: int i = 125; Object int_o = i; This gives you an Object of type System.Int32, and you can now call methods on it, etc. This operation - turning a value into an object - is known as "boxing", and, since reference types live on the heap, can be used to extend the lifetime of a value beyond its "natural" life on the stack. Even user-defined structs can be boxed. What about IL operations? There are surprisingly few. You can load/store constants, local variables, arguments, fields and array elements; you can create and dereference pointers; you can do arithmetic; you can do conversion, casting and truncating; there are branch ops (including a built-in lookup-table switch op) and method call ops; there's a special tail-recursion method-call op; you can throw and handle exceptions; you can box and unbox; you can create an array and find its length; you can handle typed references. And that's it. Anything else is outside the realm of the CLR, and has to be implemented with external methods. PE files are .NET's executables. They're made up of assemblies and metadata; assemblies are collections of classes, much like Java's JAR files, they can be distributed separately. (as DLLs, in the Windows world) Assemblies are stamped with a name, version, md5sum and signature. (A "strong name") Metadata, on the other hand, comprises type information about classes and methods, information about obsolescence, etc. You can provide other metadata for your own use - for instance, if you're implementing your own JIT, you can provide hints. Metadata is available for introspection at runtime. Classes in assemblies are much like Java classes; they have fields, methods, constructors, and so on. They also have nested types, or inner classes. Additionally, there are properties, which are fields with get/set methods: private int MyInt; public int SomeInt { get { Console.WriteLine("I was got.\n"); return MyInt; } set { Console.WriteLine("I was set.\n"); MyInt = value; } } CLR doesn't currently support closures natively, although there's a research time working on it and it may be introduced to the spec in the future. For more information, http://www.gotdotnet.com/ and http://www.albahari.com/ are pretty good locations. -- People in a Position to Know, Inc.