I'll split my replies into separate threads to make it easier to wrap our brains around individual chunks.

Patrick R. Michaud wrote:

Clear boundaries between components: (Fuzzy boundaries of abstraction make it difficult to allow for other implementations of the AST/OST or customization of the compiler object.)

- The 'compile' method doesn't belong in the PAST object, it belongs in HLLCompiler.
...

After a lot of thought and false starts, I ended up taking a
different approach to compilation than the "HLLCompiler specifies
the complete sequence of transformations".  Essentially I've taken
the approach that a "compiler" is simply something that transforms
a source data structure into a target data structure, and so
what we really have is a sequence of "compilers".  To this end,
I really wanted to call my compiler base class 'Compiler'
and not 'HLLCompiler', but unfortunately that classname is already
used by Parrot for something else and so 'HLLCompiler' is what I
chose until that could be resolved.  The 'HLL' probably implies
more than I intended to imply.

So, the 'Abc' compiler really is just something that converts the
'bc' language into a PAST structure, after doing that it simply hands the result off to the 'PAST-pm' compiler. Similarly,
the 'PAST' compiler translates into POST and hands the result
off to the POST compiler, and POST simply does its thing and
returns a PIR or executable result.

Let's take a couple steps back. The compiler module is really like Test::Builder. It's the infrastructure code that provides standard functionality to all compiler writers. Standardization is good, it means we don't have 500 incompatible implementations of 'ok'. (Actually, we still have non-standard implementations of 'ok' floating around, and they're a major headache. All the more reason to standardize the compiler tools early on.)

With tests, each test file does one thing (tests a chunk of code, says 'ok' or 'not ok' multiple times). The individual tests don't need to each duplicate the infrastructure code. Test::Harness provides the infrastructure, progresses through all the tests, maintains meta-information as it goes, and summarizes at the end.

With compiler modules, the individual PGE and TGE modules each do one thing, take in the "source code" in one form and output it in another form. There's no need to re-write the infrastructure code into the syntax tree modules for every stage of compilation. Let Compiler::Builder (or Compiler::Harness, or whatever we call it) handle the infrastructure.

- The 'compile' method also doesn't belong in the main compiler executable, it belongs in HLLCompiler.
- Merge them into one 'compile' method in HLLCompiler.
- Customization of HLLCompiler should be handled by creating a subclass of HLLCompiler. (The current 'register' strategy is somewhat fragile.)

I don't have any problem with having each language subclass
HLLCompiler and override the 'compile' method in each, I'll
work on that soon.  Of course, the method still ends up one way
or another in the main compiler executable, it may simply change
the namespace.

The point is that 99% of compiler writers shouldn't need to write any code for the 'compile' method at all.

- Provide an 'init' method for HLLCompiler that lets the compiler writer set which modules HLLCompiler will use for each stage of compilation. This will cover the majority of compilers without requiring each compiler writer to define their own 'compile' routine.

Because of the multi-stage approach I've taken, the compile
routines are already fairly short, and to me they're not at all
onerous for a compiler writer to create.  For each of languages/abc/,
languages/APL/, and languages/perl6/ the 'compile' method is
less than 30 lines of PIR.  (And it will only require a couple
of lines of code to abstract the existing call to 'compile' methods
of PAST/POST to instead use PAST/POST compilers.)

a) Most compilers will simply cut-n-paste an existing 'compile' routine from an existing compiler. Cut-n-paste programming is a "code smell" and a maintenance headache.

b) Why require the compiler writer to write 30 lines of code when they could write one? The entire core executable for a compiler could consist of nothing but:

.sub '__onload' :load :init
    # load your modules
    $P1 = new [ 'HLLCompiler' ]
$P1.'init'('language'=>'punie', 'parse_grammar'=>'Punie::Parser', 'ast_grammar'=>'Punie::AST::Grammar')
.end
.sub 'main' :main
    .param pmc args
    $P0 = compreg 'punie'
    $P1 = $P0.'command_line'(args)
    .return ($P1)
.end

That's a great selling point to new compiler writers. (And I'd be even happier if we could export the 'main' routine from HLLCompiler instead of cut-n-pasting it.)

I also think that many compilers may end up with compiler-specific
option flags or other items that need to be taken care of, and it
seems to me that this is more easily handled by a method definition
than a module specification.

Some will, but subclassing Compiler::Builder is a familiar and straightforward process, and will give them all the flexibility they need to customize its behavior, not just the 'compile' routine. Optimize for the common case, be flexible enough for the complex case.

(If the parser grammar module was specified in HLLCompiler's 'init', then the compiler object would know where to look for the optable.)

I'm thinking this is really a parameter to the AST compiler...

It's infrastructure code. Any stage of compilation may need access to the optable, so the information on where to find it belongs in the meta-object that is governing all the compilation stages. (Generating the optable I'll leave for a different thread.)

- In HLLCompiler, split the 'compile' method out into independent methods for each compilation stage ('compile_ast', 'compile_ost', 'compile_pir', etc.), all called from 'compile'.

Again, I tend to think of this as being all separate compilers,
each of which automatically call its default next stage until
compiler options tell it to do otherwise.

Standardized infrastructure code good. Make Ogg-itect happy. :)


Once we have a standardized infrastructure, it opens up lots of possibilities. Like, how about a subclass of Compiler::Builder that accumulates statistics about the time spent on each stage of compilation and reports it at the end of the compile? Or language smoke-testing reports on the website broken down by compile stage? ("This test was successful through the POST stage, but this one never made it through the parse.")

Allison

Reply via email to