>> I propose you and I work together to make a
>> totally Forth-language agnostic Forth
>> micro-kernel.  This kernel can be very
>> minimalistic, a stacik, a machine state hash,
>> and definitions for the words "code", "next",
>> "word", and "'" (tick) all having standard
>> Forth
>> behavior, a simple dictionary and a simple
>> eval
>> loop.
>
> I'll reply to this portion of your email later,
> when I get time to
> think and to look at the Parakeet code.

Okay, note that the code I mentioned (the
speration of core from core words) is not
checked in right now, but the version in CVS
does do NCG.

>> Some Parakeet ideas might also be used in your
>> code, for example, it looks to me like your
>> code
>> does direct threading:
>
> ...
>
>> Direct threading is a common Forth
>> implementation technique, but it was most
>> often
>> used because it could be implemented portably
>> in
>> C with only a small bit of asm.  For smaller
>> ops
>> like @ !, math ops, amd many others, it is
>> more
>> optimal to use direct code generation to
>> "inline" the PIR code itself instead of
>> inlineing an invoke to the PIR code compiled
>> as
>> as sub.
>
> ...
>
>> resulting in a lot less overhead for core
>> words.
>>  NCG was usually either a commercial feature
>> or
>> rarely seen in Forth because it was
>> non-portable, being written in ASM, and
>> expensive to maintain and multiple platforms.
>> We can kick that problem to the door.
>
> I'm not sure that's right. I did think about
> putting the code inline
> (and it would be a trivial change to do so), but
> I'm not convinced it
> would be faster. Yes, you wouldn't have to deal
> with the overhead
> involved with making subroutine calls, but IMCC
> would also have to
> re-parse and re-compile the code every time.

But only at compile time or interactive
interpretation time.  Not at runtime.  Consider
the code typed into the Forth interpreter:

2 dup * .

would of course print '4'.  Your correct that
using NCG this would require compiling new PIR
every time it is typed in, but *only* when you
are working interactively.  The time it takes to
do this is infinitessimal compared to the time
it takes to type it in.

For an already compiled word being executed,
however, NCG is *much* faster than calling
subroutines.  Consider:

: square dup * ;

in psudo-pir, given the definitions of dup and *:

.sub dup
  .POP
  .NOS = .TOS
  .PUSH2
.end

.sub mul:
  .POP2
  .TOS = .TOS * .NOS
  .PUSH2
.end

 using direct thrading this would rsult in the
execution of:

  find_global $P0, "dup"
  invoke $P0
  find_global $P0, "mul"
  invoke $P0

in NCG it would result in the execution of:

  .POP
  .NOS = .TOS
  .PUSH2            # this can be optimized out
  .POP2               # of NCG, but not direct
threading
  .TOS = .TOS * .NOS
  .PUSH

Now call this word from a loop:
: square_to_thousand
  1000 0 do
    i square .
  loop
;

Using the direct threading model, this does 2000
global lookups and subroutine invokes, which in
turn, do the actual "work" of 1000
multiplications and the associated stack
traffic.  The lookups and invokes are pure
inner-loop overhead.

Using NCG this does 1000 multiplications and the
associated stack traffic (which can be optimized
out for the most part) with no lookups or
invokes.

The overhead of diect threading vs. NCG does not
need to be benchmarked, it can be proven by
argument: both methods execute the same code the
same way, but the NCG method does 2000 less
global lookups and invokes.

The "extra" compiler overhead is trivial, and it
only applies to compile-time; generally when a
program is started.  At run-time (when all those
lookups and invokes are happening in the direct
thread case) there is no additional compilation
overhead because a word is compiled only once.

Almost all other Forth's that you may see either
direct or indirect thread; this is not because
it is faster (it isn't) or simpler (not much),
but because it is portable and requires no or
little asm.  If there were only one assembly
language in the world then NCG would be the
*only* way to write a forth interpreter,
threading of any kind wouldn't make sense.

-Michel

Reply via email to