At 11:40 +1000 2005/06/28, Neil Conway wrote:
I'm trying to use Bison to construct a parser for a language A that
needs to allow syntax from a second language B to be embedded within
it (at well-defined positions within A). For example, suppose a
production from the grammar of A is:
if_stmt: IF expr THEN if_body END IF ;
where "expr" is a production in *another* Bison grammar. Language B
is defined by a fairly complex grammar that changes with some
regularity (in this case, the embedded language B is SQL, defined by
an ~8500 line bison grammar). B's grammar is maintained separately
-- merging B's grammar into the grammar for A and maintaining two
copies is a headache I'd like to avoid, if possible. Ideally I'd
like to have A's parser "call into" B's parser, have it accept as
much input as it can, and then return control to A's parser.
Is this possible? One idea would be to use hand-written parsers for
both languages, although it would be nice to stick with Bison if
possible.
Bison only one single .y file as input, and must be presented such a
file. One way to do it, might to use a preprocessor (Perl, M4
perhaps) so merge the files.
The problem might be, though, that the combined grammar isn't LALR(1) anymore.
(At present this is implemented by having the lexer look for a
delimiter in the input that we know marks the end of the tokens of
the embedded language. We then concatenate these tokens into a
string, and eventually pass that string to the yyparse() of the
embedded language's parser. So in the above example we would
basically consume tokens until we see a "THEN", and assume that
anything between the "IF" and the "THEN" is a SQL statement. This is
ugly, to say the least.)
There are a few methods in use to combine grammars. One is to merge
grammars together, as you suggested. Another is what you already is
using. Sometimes, one is using a hybrid:
If one has a large number dynamic operator precedences, as in Prolog
for example, then it is not really feasible to write that into a
large Bison's grammar directly. So one would a small grammar, putting
the operators on a stack, and then let a sort out the precedences in
the actions. If the number of precedence is small, as Haskell, which
only has ten levels, one could write it in directly in a static
grammar, though. There, some do it, others use the first method.
In general, no method is better than the other; in the specific case,
one chooses the one most convenient, simply.
You might also get more inputs on this question in the newsgroup
comp.compilers.
--
Hans Aberg
_______________________________________________
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison