On 8/6/13 2:23 PM, gixxi wrote: > Hi there, > > I wanne dive a bit more deep into the clojure compiler and I wonder whether > it follows the std procedure for compiled languages > > Character Stream -> Scanner (lexical analysis) -> Token Stream > Token Stream -> Parser (syntax analysis) -> Parse Tree > Parse Tree -> Semantic Analysis -> Abstract Syntax Tree (AST) > (Optional) AST -> Machine Independent code improvement) -> Modified AST > Modified AST -> Target Code Generation -> Target Language (here Java Byte > Code) > > Thanks for hints and references regarding this topic. Literature on this > topic (e.g. O'Reilly Clojure Programming by Emerick, Carper et Grand) > states that clojure programs are written using clojure datastructures and > directly represent an AST. But a task of semantic analysis is to check > whether a statement given in the programming language is valid in terms of > the programming language semantics. So some component has to check whether > the symbol foo in the list (foo "baz") denotes either a static function > applicable to a string literal or a instance function of the string literal > itself. Which component serves this job? > > thanks & cheers > > christian >
being a lisp with a reader the converting of the character stream to data structures doesn't happen in the compiler, the compiler's input is a data structure produced by the reader. the compiler does: 1. macro expansion 2. analysis 3. code generation you could argue that macro expansion is just part of analysis (analysis code calls macroexpand) but I think it is more useful mentally to keep it as a distinct phase just before analysis. the clojure compiler is not very complicated and does a pretty straight forward mapping of clojure expressions to jvm byte code, depending on the jvm's jit for optimizations. the compilation unit in clojure is a single top level expression, however there are some special purpose hooks in the compiler for compiling a whole source file at a time. those hooks don't really change anything about the generated code; they change the context the generated code is emitted in to. the target being jvm byte code you can't just emit instructions, the instructions have to belong to a method, and the method has to belong to a class. there is a meme going around that lisp is its own ast, but if you'd worked on a compiler and are familiar with the richness of a real ast, that tends to just seem silly. the clojure compiler does an analysis of the clojure datastructures and emits a tree of richer expression objects which is the ast, which the code generator uses to generate bytecode. the ast objects have methos like emit(...) which is how the code generation is actually done. all this being said I don't think anyone is a huge fan of the current compiler, but it has three huge advantage over alternatives: 1. it works (this might be debatable) 2. it has been in use for many years now 3. and alternative would involve making lots of effort to ultimately get back to where we are which is(my opinion) why it still persists. the clojurescript compiler (which shares no code with the clojure compiler) could be considered a more modern approach to clojure complication, but there are important parts of the clojure environment (vars mainly) that are missing from the clojurescript environment. the clojurescript compiler has a very rich intermediate representation using maps. in my opinion the intermediate representation is so rich it is actually kind of hard to work with. -- And what is good, Phaedrus, And what is not good— Need we ask anyone to tell us these things?
signature.asc
Description: OpenPGP digital signature
