I'll split my replies into separate threads to make it easier to wrap
our brains around individual chunks.
Patrick R. Michaud wrote:
Clear boundaries between components: (Fuzzy boundaries of abstraction
make it difficult to allow for other implementations of the AST/OST or
customization of the compiler object.)
- The 'compile' method doesn't belong in the PAST object, it belongs in
HLLCompiler.
...
After a lot of thought and false starts, I ended up taking a
different approach to compilation than the "HLLCompiler specifies
the complete sequence of transformations". Essentially I've taken
the approach that a "compiler" is simply something that transforms
a source data structure into a target data structure, and so
what we really have is a sequence of "compilers". To this end,
I really wanted to call my compiler base class 'Compiler'
and not 'HLLCompiler', but unfortunately that classname is already
used by Parrot for something else and so 'HLLCompiler' is what I
chose until that could be resolved. The 'HLL' probably implies
more than I intended to imply.
So, the 'Abc' compiler really is just something that converts the
'bc' language into a PAST structure, after doing that it simply
hands the result off to the 'PAST-pm' compiler. Similarly,
the 'PAST' compiler translates into POST and hands the result
off to the POST compiler, and POST simply does its thing and
returns a PIR or executable result.
Let's take a couple steps back. The compiler module is really like
Test::Builder. It's the infrastructure code that provides standard
functionality to all compiler writers. Standardization is good, it means
we don't have 500 incompatible implementations of 'ok'. (Actually, we
still have non-standard implementations of 'ok' floating around, and
they're a major headache. All the more reason to standardize the
compiler tools early on.)
With tests, each test file does one thing (tests a chunk of code, says
'ok' or 'not ok' multiple times). The individual tests don't need to
each duplicate the infrastructure code. Test::Harness provides the
infrastructure, progresses through all the tests, maintains
meta-information as it goes, and summarizes at the end.
With compiler modules, the individual PGE and TGE modules each do one
thing, take in the "source code" in one form and output it in another
form. There's no need to re-write the infrastructure code into the
syntax tree modules for every stage of compilation. Let
Compiler::Builder (or Compiler::Harness, or whatever we call it) handle
the infrastructure.
- The 'compile' method also doesn't belong in the main compiler
executable, it belongs in HLLCompiler.
- Merge them into one 'compile' method in HLLCompiler.
- Customization of HLLCompiler should be handled by creating a subclass
of HLLCompiler. (The current 'register' strategy is somewhat fragile.)
I don't have any problem with having each language subclass
HLLCompiler and override the 'compile' method in each, I'll
work on that soon. Of course, the method still ends up one way
or another in the main compiler executable, it may simply change
the namespace.
The point is that 99% of compiler writers shouldn't need to write any
code for the 'compile' method at all.
- Provide an 'init' method for HLLCompiler that lets the compiler writer
set which modules HLLCompiler will use for each stage of compilation.
This will cover the majority of compilers without requiring each
compiler writer to define their own 'compile' routine.
Because of the multi-stage approach I've taken, the compile
routines are already fairly short, and to me they're not at all
onerous for a compiler writer to create. For each of languages/abc/,
languages/APL/, and languages/perl6/ the 'compile' method is
less than 30 lines of PIR. (And it will only require a couple
of lines of code to abstract the existing call to 'compile' methods
of PAST/POST to instead use PAST/POST compilers.)
a) Most compilers will simply cut-n-paste an existing 'compile' routine
from an existing compiler. Cut-n-paste programming is a "code smell" and
a maintenance headache.
b) Why require the compiler writer to write 30 lines of code when they
could write one? The entire core executable for a compiler could consist
of nothing but:
.sub '__onload' :load :init
# load your modules
$P1 = new [ 'HLLCompiler' ]
$P1.'init'('language'=>'punie', 'parse_grammar'=>'Punie::Parser',
'ast_grammar'=>'Punie::AST::Grammar')
.end
.sub 'main' :main
.param pmc args
$P0 = compreg 'punie'
$P1 = $P0.'command_line'(args)
.return ($P1)
.end
That's a great selling point to new compiler writers. (And I'd be even
happier if we could export the 'main' routine from HLLCompiler instead
of cut-n-pasting it.)
I also think that many compilers may end up with compiler-specific
option flags or other items that need to be taken care of, and it
seems to me that this is more easily handled by a method definition
than a module specification.
Some will, but subclassing Compiler::Builder is a familiar and
straightforward process, and will give them all the flexibility they
need to customize its behavior, not just the 'compile' routine. Optimize
for the common case, be flexible enough for the complex case.
(If the parser grammar module was specified in
HLLCompiler's 'init', then the compiler object would know where to look
for the optable.)
I'm thinking this is really a parameter to the AST compiler...
It's infrastructure code. Any stage of compilation may need access to
the optable, so the information on where to find it belongs in the
meta-object that is governing all the compilation stages. (Generating
the optable I'll leave for a different thread.)
- In HLLCompiler, split the 'compile' method out into independent
methods for each compilation stage ('compile_ast', 'compile_ost',
'compile_pir', etc.), all called from 'compile'.
Again, I tend to think of this as being all separate compilers,
each of which automatically call its default next stage until
compiler options tell it to do otherwise.
Standardized infrastructure code good. Make Ogg-itect happy. :)
Once we have a standardized infrastructure, it opens up lots of
possibilities. Like, how about a subclass of Compiler::Builder that
accumulates statistics about the time spent on each stage of compilation
and reports it at the end of the compile? Or language smoke-testing
reports on the website broken down by compile stage? ("This test was
successful through the POST stage, but this one never made it through
the parse.")
Allison