> This still doesn't seem right. The compilation > from Forth to PIR only > happens once, yes. But each time the defined > word is used, the PIR > code, which is injected, must be compiled to > bytecode.
RIght. > The second PIR sequence is longer. It will take > IMCC more time to > compile that than the first example. As the > words become less trivial, > this will become more true. But one can't weigh the compile-time overhead against the run-time overhead like that. What if your inner loop runs for many days when compilation of the NCG code takes only a couple of milliseconds more? > But like you said, this only happens at (a) > compile time or (b) at the > interactive prompt. Right. > And optimizing out push/pop > combos will speed > things up more, though I'm not sure how to > implement that optimization > using PIR. Well that's a good question because I haven't dont it yet. The simple hack aproach is to pattern replace back to back PUSH/POP pairs right out with string search and replace. I estimate this could eliminate up to half of the data stack traffic on average, depending on the code of course. The whole-hog way to do it would be to have a stack data analysis algorithm try to cache as much of the stack in P registers as possible, spilling when necessary. This could probably eliminate almost all stack data traffic except for extremely pathological cases. The Python interpreter could use this method too to really spank CPython, which has implicit stack traffic that cannot be easily optimized out. > Furthermore, our two models will behave > differently when you redefine > a word. Consider this Forth example: > > : inc 1 + ; > : print+ inc . ; > : inc 2 + ; > > Should print+ increment by one or by two? gforth > increments by one. I"ve made a pretty big mistake so far by calling indirect threading direct threading. The lookup/invoke sequence is really indirect threading. Direct threading would be if we could somehow compile into PIR a literal "pointer" to a sub instead of having to look it up by name. I don't think this can be done. gforth keeps the old behavior because it uses direct threading, the pointer never changes inside the compiled body of print+ even though the word definition later does. In the case of words defined with ":" and ";" even NCG still does indirect threading via a lookup and invoke, NCG only inlines CORE word definitions, words that are defned in PIR and form the basis for all high level words, but the high level words themselves are indirect threaded. I should have mentioned this before, but this doesn't invalidate my previous example: : square dup * ; : square_to_thousand 1000 0 do i square . loop ; 1000 lookups and invokes are still required to find the high level word "square" in either indirect threading or NCG (and direct threading still requires 1000 invokes), but 2000 lookups and invokes are still eliminated from the inner loop with NCG because "dup" and "*", which are core words, are inlined. Whether or not an old definition is retained if a word is redefined is a different question, in the case of Parakeet, it will increment by two because all high level words are looked up by name at run-time via indirect threading. > I'd be interesting in knowing which was the > "correct" behavior. I suspect it is implementation defined, but unfortunately taygeta.com is not working for me right now. -Michel