Hi, I think I mentioned on one of the bison lists a while ago I was thinking of using bison as a parser generator for a lisp program, which is part of a COBOL front-end for GCC.
The proof of concept is now working. It is a bit rough around the edges bit I think it's at a point where it's worth discussing the design and options. Contents 1. Have a look! 2. Challenges. 3. Options. 4. Conclusion. Have a look ----------- The code is all available for browsing from here: http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/ The files of interest are Skeleton for current cvs head (2.3a): http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/bison-parser-skel-v2.3a.m4?revision=1.1&view=markup Skeleton for V2.3 http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/bison-parser-skel-v2.3.m4?revision=1.1&view=markup Parse definition for simplified calc toy application: http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/calc-lisp-parser.lispy?revision=1.1&view=markup Input file for testing calc: http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/calc-test-input.txt?revision=1.1&view=markup Makefile (you may need to change the location of your bison executable) http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/Makefile?revision=1.1&view=markup Command file that pretends to be m4 to fix the file name up and to save the m4 input file, for debugging purposes. It also allows fine control over the M4 options. http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/bison_fake_m4.x?revision=1.1&view=markup This compiles and builds on Ubuntu linux.You can get all the code using cvs by following the instructions here http://sourceforge.net/cvs/?group_id=5709 Note the project documentation is out of date. I will be updating it in the next few days. Challenges ----------- 1. The use of m4 and the skeletons makes it viable to use bison in this way, whereas with Flex, with hand coded output, it was really not possible. M4 makes things a little fragile and harder to debug but there is no obvious alternative. 2. Version 2.3a is quite a lot friendlier to those who want to roll their own skeleton that 2.3. Even so, back-porting to 2.3 only took a couple of hours. 3. Given the differences between 2.3 and 2.3a it is not possible to have one skeleton that supports both, as far as I could tell. This means that I had to detect the version number, as follows > export BISON_VERSION=`if ${BISON} --version|grep "2.3a" >/dev/null; \> > then echo -n "2.3a"; \ > else if ${BISON} --version|grep "2.3" >/dev/null; then echo -n "2.3"; \ > else echo -n "_unknown_bison_version"; false; fi; fi` Any suggestions how to do this better? 4. The fake m4 program was needed for V2.3 to allow skeletons not in the official directory, or relative to the official directory. V2.3 prefixes your skeleton file name with the default skeleton directory. Thus -S /home/hacker/skel1.m4 => /use/share/bison//home/hacker/skel1.m4 This problem is fixed in 2.3a, which is excellent. 5. C-isms. Bison scans the actions, assuming they are C code or other dialects of C (Java/C++). This is not lisp compatible. I put a temporary fix in my actions where I embed a C comment start and end in Lisp comments, as follows { #|/*|# lisp code can go here as long as there is no "*/" embedded #|*/|# } I think the longer term fix for this is to have the language table drive a different versions of the scanners or logic within them. All three scanners (*.l) are impacted. 6. V2.3 generates actions which are C code and which is hard to convert to lisp code using M4 only. In the process I lose the ability to directly specify package names, though I can use macros to get around this. This problem us fixed in v2.3a. The main problem is where it says case 27: { ...code... break; } Getting rid of the colon but being selective is the problem. Is there an m4 way to say s/:$//? 7. (Speculation) The bison table structure is quite complex and is by now to some extent a legacy of earlier times when memory was in very short supply. It could probably be simplified, I suspect, at a very affordable cost in memory. This is just speculation because I don't know how big a simplistic hash table populated to 75% would be. If I end up not using bison I will know because this would be the approach I would use writing my own LR1/LALR parser generator. 8. (This one is purely a lisp issue and bison cannot really help). Some versions of Lisp lack an efficient case macro (equivalent of C switch) because other alternatives exist. So I actually set up the addresses of the actions in an array. This means the actions all have to be functions. The good thing about this is that you could just put the address of the function in the action, but on the other hand this divorces the action code from the location in the grammar. If you want to include the actions literally in the grammar. This means wrapping them in a literal function definition which looks like this #'(lambda (parm) code ...)/ As always with lisp, you can use macros to remove anything repetitive or ugly. 9. I have not implemented all the options fully eg pure versus not pure. Effectively all the parser are pure because Lisp closures make this basically costless. Next Steps ---------- I see three alternatives for a lisp parser 1. Code from scratch. Get out the dragon book and code from scratch. This would be about 1,000 lines of Lisp code including driver based on my experience writing a lexer. 2. Use the approach above. Depending on feedback I could finish this off, add documentation and fixing the rough edges and submit to include as part of bison. 3. Bison tables approach. Write a skeleton that just outputs the tables, in C format, with a user supplied driver. The driver would have access to all the tables and could output them in any desired format, such as Lisp (or other languages too). The main issues with this approach are a) Two levels of code generation are being used. No way to keep the line numbers. b) What to do about actions. One approach would be to have each action look something like this { printf(" ... lisp code ...); }. The lisp code would be embedded in C strings. Then have the skeleton run through all the actions for (i=0; i<n; i++) switch () {}... I have done this sort of thing before with no real problems. Conclusion ---------- Probably my inclination is to use the skeleton I have written. If there is interest I will look at polishing it for inclusion in bison in due course. Failing that, option 3 is most appealing because it involves the fewest changes to bison... just a simple skeleton which would be a variant of yacc.c. Any thoughts or comments are most welcome. Tim Josling _______________________________________________ help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison