Hi Andrew,
Le 17/11/2017 à 12:26, Prof. Andrew P. Black a écrit :
On 17 Nov 2017, at 14:10 , Thierry Goubier <thierry.goub...@gmail.com> wrote:
there is an 'E O F' token generated by SmaCC; I haven't tried to use it in a
parser yet.
I tried patching the tokenActions table to trap on this, but the token id for E
O F is outside of the range of the table. The Python example that you pointed
me to is a little different. It overrides scannerError, and explicitly adds a
newline token if there is an error at the end of the file. It doesn’t actually
use the E O F token, but it is probably a pattern that I can steal.
In all honesty, I wasn't thinking about that, but instead to be able to
write '<eof>' in the grammar itself to terminate statements.
The Python approach is necessary because you may have to emit additional
dedent tokens at the end of a file (this is a typical issue of those
meaningfull identation whitespace languages: an idea used in the very
beginning of programming languages, then considered harmfull, then
coming back up again...).
In the meantime, I made the final StatementSeparator (<newline> or ";")
optional in all the productions. The grammar is a bit ugly, but the parser is cleaner.
Which is the cleanest way to do it (at least, like that, you have a
documented way around that instead of carrying around a grammar + hacks
in the scanner)(*)
I also gave up trying to eliminate intermediate parseTree nodes. Instead, I
eliminated intermediate productions form the grammar. This makes the grammar
more ugly (it has several repetitions where I inlined the intermediate
productions), but the
tree construction is a lot more straightforward.
Sorry for having been unable to answer your questions on that :( I'm
happy to learn you've found a way around it.
Thierry
(*) Which is still way better than a hand-written, recursive descent
parser where any line can hide a hack...
Andrew