> This still doesn't seem right. The compilation
> from Forth to PIR only
> happens once, yes. But each time the defined
> word is used, the PIR
> code, which is injected, must be compiled to
> bytecode.

RIght.

> The second PIR sequence is longer. It will take
> IMCC more time to
> compile that than the first example. As the
> words become less trivial,
> this will become more true.

But one can't weigh the compile-time overhead
against the run-time overhead like that.  What
if your inner loop runs for many days when
compilation of the NCG code takes only a couple
of milliseconds more?

> But like you said, this only happens at (a)
> compile time or (b) at the
> interactive prompt.

Right.

> And optimizing out push/pop
> combos will speed
> things up more, though I'm not sure how to
> implement that optimization
> using PIR.

Well that's a good question because I haven't
dont it yet.  The simple hack aproach is to
pattern replace back to back PUSH/POP pairs
right out with string search and replace.  I
estimate this could eliminate up to half of the
data stack traffic on average, depending on the
code of course.

The whole-hog way to do it would be to have a
stack data analysis algorithm try to cache as
much of the stack in P registers as possible,
spilling when necessary.  This could probably
eliminate almost all stack data traffic except
for extremely pathological cases.

The Python interpreter could use this method too
to really spank CPython, which has implicit
stack traffic that cannot be easily optimized
out.

> Furthermore, our two models will behave
> differently when you redefine
> a word. Consider this Forth example:
>
>  : inc 1 + ;
>  : print+ inc . ;
>  : inc 2 + ;
>
> Should print+ increment by one or by two? gforth
> increments by one.

I"ve made a pretty big mistake so far by calling
indirect threading direct threading.  The
lookup/invoke sequence is really indirect
threading.  Direct threading would be if we
could somehow compile into PIR a literal
"pointer" to a sub instead of having to look it
up by name.  I don't think this can be done.

gforth keeps the old behavior because it uses
direct threading, the pointer never changes
inside the compiled body of print+ even though
the word definition later does.

In the case of words defined with ":" and ";"
even NCG still does indirect threading via a
lookup and invoke, NCG only inlines CORE word
definitions, words that are defned in PIR and
form the basis for all high level words, but the
high level words themselves are indirect
threaded.

I should have mentioned this before, but this
doesn't invalidate my previous example:

: square dup * ;

: square_to_thousand
  1000 0 do
    i square .
  loop
;

1000 lookups and invokes are still required to
find the high level word "square" in either
indirect threading or NCG (and direct threading
still requires 1000 invokes), but 2000 lookups
and invokes are still eliminated from the inner
loop with NCG because "dup" and "*", which are
core words, are inlined.

Whether or not an old definition is retained if
a word is redefined is a different question, in
the case of Parakeet, it will increment by two
because all high level words are looked up by
name at run-time via indirect threading.

> I'd be interesting in knowing which was the
> "correct" behavior.

I suspect it is implementation defined, but
unfortunately taygeta.com is not working for me
right now.

-Michel

Reply via email to