On 8/6/13 2:23 PM, gixxi wrote:
> Hi there,
> 
> I wanne dive a bit more deep into the clojure compiler and I wonder whether 
> it follows the std procedure for compiled languages
> 
> Character Stream -> Scanner (lexical analysis) -> Token Stream
> Token Stream -> Parser (syntax analysis) -> Parse Tree
> Parse Tree -> Semantic Analysis -> Abstract Syntax Tree (AST)
> (Optional) AST -> Machine Independent code improvement) -> Modified AST
> Modified AST -> Target Code Generation -> Target Language (here Java Byte 
> Code)
> 
> Thanks for hints and references regarding this topic. Literature on this 
> topic (e.g.  O'Reilly Clojure Programming by Emerick, Carper et Grand) 
> states that clojure programs are written using clojure datastructures and 
> directly represent an AST. But a task of semantic analysis is to check 
> whether a statement given in the programming language is valid in terms of 
> the programming language semantics. So some component has to check whether 
> the symbol foo in the list (foo "baz") denotes either a static function 
> applicable to a string literal or a instance function of the string literal 
> itself. Which component serves this job?
> 
> thanks & cheers
> 
> christian
> 

being a lisp with a reader the converting of the character stream to
data structures doesn't happen in the compiler, the compiler's input is
a data structure produced by the reader.

the compiler does:

1. macro expansion
2. analysis
3. code generation

you could argue that macro expansion is just part of analysis (analysis
code calls macroexpand) but I think it is more useful mentally to keep
it as a distinct phase just before analysis.

the clojure compiler is not very complicated and does a pretty straight
forward mapping of clojure expressions to jvm byte code, depending on
the jvm's jit for optimizations.

the compilation unit in clojure is a single top level expression,
however there are some special purpose hooks in the compiler for
compiling a whole source file at a time. those hooks don't really change
anything about the generated code; they change the context the generated
code is emitted in to. the target being jvm byte code you can't just
emit instructions, the instructions have to belong to a method, and the
method has to belong to a class.

there is a meme going around that lisp is its own ast, but if you'd
worked on a compiler and are familiar with the richness of a real ast,
that tends to just seem silly. the clojure compiler does an analysis of
the clojure datastructures and emits a tree of richer expression objects
which is the ast, which the code generator uses to generate bytecode.
the ast objects have methos like emit(...) which is how the code
generation is actually done.

all this being said I don't think anyone is a huge fan of the current
compiler, but it has three huge advantage over alternatives:

1. it works (this might be debatable)
2. it has been in use for many years now
3. and alternative would involve making lots of effort to ultimately get
back to where we are

which is(my opinion) why it still persists.

the clojurescript compiler (which shares no code with the clojure
compiler) could be considered a more modern approach to clojure
complication, but there are important parts of the clojure environment
(vars mainly) that are missing from the clojurescript environment. the
clojurescript compiler has a very rich intermediate representation using
maps. in my opinion the intermediate representation is so rich it is
actually kind of hard to work with.


-- 
And what is good, Phaedrus,
And what is not good—
Need we ask anyone to tell us these things?

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to