On Mon, 24 Dec 2007, Ilyes Gouta wrote: > Thank you, Laurence and Evan, for your extensive explanations!
You're welcome. It occurred to me that another reason why GCC creates a syntax tree or whatever it's called (I know virtually nothing about the theory of compilers) is that GCC has multiple front-ends for different languages, i.e., FORTRAN and Pascal. I'm not sure how C++ is handled, since it requires support for features such as code generation for templates. Evan brought up a problem which seems to me to be related to something I call "the parse before the parse" problem: Sometimes one needs information to affect the course of parsing which can only be found by more parsing, having taken place beforehand. Multiple passes is one solution. Another possibility _might_ be to spawn a thread and call the same or a different parser to handle some section of the input and/or code generated by your main parser, and then make its results available somewhere. The technique of "faking" tokens and the use of an object passed as a parameter to `yyparse' to contain information and/or the state of the program would probably be useful here. I find that Bison and threads work together well and the extra effort of making `yyparse' thread-safe is minimal. Laurence > > Thank you, Laurence and Evan, for your extensive explanations! I had a look on > ANTLR and it seems to be a great tool and probably it will be my second > option. I think I'm going to try to get the scanner to construct an AST > (instead of emitting code) so I'll stick with bison for a while. As you said, > the rest of the compilation phases such as resource binding and code emission > would become just direct products of the AST traversal. > > Thank you again for your suggestions! > > Best regards, > Ilyes Gouta. > > Evan Lavelle wrote: > > I had the same problem on my first language; here's what I did. > > > > I started off thinking that I needed a simple interpreter, so I wrote the > > Bison code, and basically did all the required work in the Bison actions. > > This worked well; It looks like you're at this stage. > > > > A little later, I realised that this wasn't quite good enough. The specific > > problem, I think, was that I needed some forward declarations - ie. > > sometimes code earlier in the source file needed to know about something > > later in the source file. Tricky. After some thought, I decided that I could > > do this with minimal changes, just by rescanning the input again, with > > Bison. I scanned it once, remembered the bits I needed to know about, reset > > yyin to the beginning of the file, and scanned it again properly. I now had > > a 2-pass interpreter, which worked well. > > > > As time went on, it became obvious that this was very limiting. I needed to > > do all sorts of extra things; some of these were: > > > > 1 - more forward declarations > > > > 2 - the language allowed constant expressions, so the sematic checking code > > had to confirm that an expression was actually constant. how do you do this > > in one pass? The easy answer is to have an extra pass - you scan the code > > finding everthing that should be a constant, evaluate any expression you > > find there, replace the expression with a constant (if you can), and let the > > semantic checker confirm that there's a constant there. > > > > 3 - As semantic checking became more complex, it became obvious that I had > > to analyse *all* the code before I had enough information to analyse *some* > > of the code. This will depend on your language. > > > > 4 - some things are next to impossible to code generate just be scanning and > > rescanning the input. since you're looking at C, what about sequence points, > > and expressions with side-effects? How do you handle the side-effects? > > Imagine code-generating "y = x++, c(), x" - where do you put in the 'x' > > increment? > > > > 5 - optimisations? A code generator for a new target? What if you have to > > scan something right-to-left? And so on. > > > > Anyway, it quickly became obvious that handling anything that wasn't trivial > > is next to impossible just be scanning and rescanning the input. > > Fortunately, there's a really simple answer. The lexer and scanner do very > > little, apart from creating an AST. The scanner is no longer the 'compiler'; > > it's just a very simple front-end that creates a data structure, the AST. > > The compilation process is now a standard data-processing problem - it's > > nothing more than analysing, manipulating, and transforming the AST. Each > > analysis or transformation of the AST is, loosely speaking, a compiler > > "pass"; you can do this as few, or as many, times as you wish. It's trivial > > to add a new pass when you have some new feature or requirement. The AST > > *is* your program; the scanning is basically irrelevant. > > > > And you can still have an "interpreter" at the end of it, if you want one; > > as soon as you finish "code generation", you just run your "code", or > > whatever it is that actually does anything. Technically, however, you should > > probably call this JIT compilation, rather than interpreting. > > > > The bad news is that you need to know a *lot* more to do this, over and > > above writing a Bison scanner. The (user) scanner code (ie. your "actions") > > is probably a lot less than 5% of a practical and useful compiler. So, stick > > with scanning your input until you find out what the problems are and, if > > it's really necessary to fix them, rewrite your code around an AST. If you > > need to find out more about ASTs, look at Antlr; you can use the Antlr > > library to create and maintain your AST. > > > > Evan > _______________________________________________ help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison