I've been thinking about the possibility of building a higher-level
VM. The current VM is very close to a traditional CPU. What if we
did something non-traditional that made implementing higher-level
lexically scoped languages easier?

What if the VM (and assembler) had lexical scoping built-in at
the instruction level? Each instruction could be tagged with a
scope and if the scope changes, the VM could automatically fixup
the environment.

The current dispatch loop is something like:

  while (code)
     code = dispatch(code)

I'm proposing:

  while (code)
     if (code.scope != current.scope)
        fixup_environment(code)
     code = dispatch(code)

If we include event checking in the dispatch loop, the scope change
check is free because we can combine the two into a single test --
scope 0 would be reserved for "event pending".

The fixup_environment() routine is very simple for the normal
enter/exit one scope case and the complexity of multi-level scope
changes is handled in one place, not spread through the branching
ops or compiler. Branching into a scope, calling nested functions,
and taking a closure would all be much easier to write.

Lexically scoped instructions would also permit another possibility:
a stack-less *and* register-less architecture. The VM could use
frame offsets for everything -- the only operands needed in the
assembler are local and global variables.

Scoped assembler would look something like:

  frame(size) {
    ops ...
    frame(size) {
      ops ...
    }
  }

Local variables would use "L[frame][offset]" syntax. I see major
benefits in requiring the frame and offset values to be constants,
but they don't need to be. We'd lose the ability to statically check
code, but the dynamic lexical insertion feature would be easier to
implement. The frame size would be very easy to change too since
it's external to the instruction stream.

Global variables would use "G[name]".

The frames can be easily allocated in a small (L2 cache sized)
arena that would be just as fast as Parrot registers. If the GC
detects that a local variable has escaped, the frame just spills
into another arena, i.e. the first allocation is generation-0 and
the copy is generation-1. The compiler could even record scopes
that have escaping variables and allocate them in generation-1 to
begin with.

In summary, here's the advantages I see in a scoped VM:

  1. easier to generate code because the VM automatically
     handles entering/exiting scopes.

  2. easier to optimize scope allocations (or avoid
     allocations completely) because the instruction
     count does not change -- only the scope tags and
     frame offsets of local variables change.

  3. easier to implement higher-level features such as
     taking closures.

  4. lower instruction counts because the compiler
     doesn't need to generate enter/exit scope
     operations.

  5. performance might improve due to reducing the
     instruction count, reducing special cases in the
     branch opcodes and improving cache usage (copying
     a value into a register causes unnecessary cache
     pressure).

  6. the VM is simpler to specify -- no registers, no
     register stacks and no caller/callee saves.

The problems I see are:

  1. some languages might not need Perl 5 lexical scoping
     behavior (C for example does not allocate a frame
     for every scope -- scopes share a single frame).

  2. the test in the dispatch loop slows things down if
     event testing isn't needed. (But even though MIPS
     may be lower, "real" code may not be because the
     instruction count is lower.)

  3. a higher-level VM does not have a nice simple mapping
     to real hardware, so native code JIT will be more
     difficult.

  4. data sizes must be standardized so that offsets in
     bytecode files are portable -- otherwise we'll need
     another level of indirection which will be slower
     than Parrot registers. (If we don't care about
     mmap()ing the bytecode, then this isn't an issue.)

  5. bytecode files will probably be larger because the
     instruction count won't decrease enough to offset
     the extra tag on each instruction.

My idea might be too radical, but I want to see what everybody
thinks before we get too much code written.

- Ken

Reply via email to