Ken Fox wrote:
>
> I was noodling around with a couple different VM implementations
> and thought I would ask other people if they were doing the same.
> It would be nice to divide and conquer the exploration of different
> implementations.
>
> I looked into 3 different dispatching techniques. Here's the list
> with timings on a 600 MHz Pentium III running Linux and gcc 2.95:
>
> -g-O
> Switch/case6.0 sec 2.0 sec
> Computed goto (gcc feature)6.1 sec 1.4 sec
> Function call 11.9 sec 11.8 sec
Cool!
I have the same machine (600MHz PIII) running Linux but using a
different gcc:
% gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
and I get the same numbers for -O (I used -O3) but different numbers
without optimization. Maybe we should assume optimization?
I implemented a few variants of the function calling version.
FUNCALL_PREDICTABLE is exactly the same as your USE_FUNCALL except
instead of jumping directly through a function pointer, it does a nested
else-if statement. I had a vague idea that this would help branch
prediction, without losing the flexibility because you could always have
a final default 'else' that used a function pointer for unexpected
functions. It also allows the compiler to inline the function bodies.
(For completeness, I should really put the bodies in a separate
compilation unit; I'm not sure whether it inlined them or not. Asking
them to be inlined showed a small effect, so probably not? Or I could
just disassemble.) That got a big speedup: 10.2sec -> 4.9sec. So maybe
keeping things as separate functions isn't so harmful after all, though
that computed goto is very tempting. (But before you believe this branch
prediction BS, see below.)
Since function addresses aren't constant in gcc, that requires a nested
else-if construct. I took a shot at comparing the linear search with a
switch, but this is probably a ridiculous test because of the tiny
number of opcodes. The comparison is between FUNCALL_HYBRID and
FUNCALL_HYBRID_LINEAR.
FUNCALL_NAIVE looks at what happens if you don't rewrite opcode ids with
function pointers, and instead do a lookup for every opcode. It slows it
down some, though the whole thing is so slow it's not a big deal
percentage-wise.
FUNCALL_HYBRID is an attempt to use a switch with separate functions,
avoiding function pointers. It gives the same bottom line: on the PIII,
(1) function pointers are expensive, and (2) the compiler doesn't do a
very good job of inlining (with inlining on, FUNCALL_HYBRID should be
the same as SWITCH, but it's half the speed.)
I'm not sure I really buy the branch prediction argument, so I tried
another version FUNCALL_PREDICTABLE_HALF with half of the opcodes
replaced with function pointers. So branch prediction should work almost
as well since there's still always a single destination for every IP
value (although there's still an extra level of indirection -- major
handwaving here), but things go through a function pointer half the
time. It slows things down some, but much less than half the difference
between function pointers and no function pointers, so it seems to
support the branch prediction argument. But I'm way too far out on a
limb to believe anything I'm saying any more. :-)
Algorithm Inline? OptimizationTime (sec)
GOTO- -O3 1.35
GOTO- none5.87
SWITCH - -O3 2.19
SWITCH - none5.72
FUNCALL_ORIGno -O3 10.15
FUNCALL_ORIGno none12.37
FUNCALL_ORIGyes -O3 10.16
FUNCALL_ORIGyes none12.37
FUNCALL_NAIVE no -O3 11.66
FUNCALL_NAIVE no none13.51
FUNCALL_NAIVE yes -O3 11.67
FUNCALL_NAIVE yes none13.51
FUNCALL_HYBRID no -O3 4.21
FUNCALL_HYBRID no none10.08
FUNCALL_HYBRID yes -O3 4.20
FUNCALL_HYBRID yes none10.05
FUNCALL_HYBRID_LINEAR no -O3 4.67
FUNCALL_HYBRID_LINEAR no none10.06
FUNCALL_HYBRID_LINEAR yes -O3 4.67
FUNCALL_HYBRID_LINEAR yes none10.07
FUNCALL_PREDICTABLE no -O3 4.92
FUNCALL_PREDICTABLE no none9.85
FUNCALL_PREDICTABLE yes -O3 4.87
FUNCALL_PREDICTABLE yes none9.84
FUNCALL_PREDICTABLE_HALFno