Flex/Bison: which version to use?

2005-06-13 Thread Frans Englich

Hello,

From a vague memory of fellow developers' experiences, I have the idea that 
what version of Bison/Flex one use is significant. But perhaps that isn't 
relevant anymore.

For my project, written in C++, I simply downloaded the latest versions(at 
least at that time); I use Bison 2.0, and Flex 2.5.31.

Does it matter what version one uses? What is recommended? For what scenarios 
does version matter(portability, programming language, functionality, etc)? 
(if at all)

Another question; I noticed that I had "flex++" installed, as well as that 
"bison++" exists, after a quick googling. Are those "++" versions established 
software which people use? What are their advantages? When should I need to 
think about whether I should use the "++" programs?


Thanks in advance,

Frans


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: Flex/Bison: which version to use?

2005-06-13 Thread Hans Aberg

At 11:28 + 2005/06/13, Frans Englich wrote:

For my project, written in C++, I simply downloaded the latest versions(at
least at that time); I use Bison 2.0, and Flex 2.5.31.


Those are the latest official versions. For Bison, there is a test 
version 2.0a, you may want to try out, to make sure the next release 
works for you:

  ftp://alpha.gnu.org/gnu/bison/bison-2.0a.tar.gz


Does it matter what version one uses? What is recommended? For what scenarios
does version matter(portability, programming language, functionality, etc)?
(if at all)


Typically, only the last official version is supported. If you have 
problems with an earlier version, people helping will not remember 
those versions, and don't expect there will be bug fix; more likely, 
it has already been reported and fixed in a later version.


If one needs bug fixes, sometimes one may need to pick down the 
latest alpha or from the CVS. I did that with Flex on Mac OS 10.3.9.



Another question; I noticed that I had "flex++" installed, as well as that
"bison++" exists, after a quick googling. Are those "++" versions established
software which people use? What are their advantages? When should I need to
think about whether I should use the "++" programs?


Those are independent programs, supposedly old. Bison and Flex do not 
have anything with them to do, what I know.

--
  Hans Aberg


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Modularized parser files covering similar grammars

2005-06-13 Thread Frans Englich

Hello,

I have a design dilemma that will become real some time in the future, and 
consider how large it is, I thought it could be a good idea to take a quick 
look forward.

I am building a Bison parser for a language, or to be precise, multiple 
languages which all are very similar. I have a "main" language, followed by 
three other languages which all are subsets of the main language.

To be precise, I'm building a parser for the XPath language, and the different 
flavours I need to be able to distinguish are:

* XPath 2.0. This is as broad as it gets.
* XPath 1.0. A subset of XPath 2.0. XPath 2.0 is an extension of XPath 1.0
* XSL-T 2.0 Patterns. A small subset of XPath 2.0
* XSL-T 1.0 Patterns. A small subset of XPath 1.0
* W3C XML Schema Selectors. An even smaller subset of XPath 1.0

My wondering is how I practically should modularize the code in order to 
efficiently support these different languages.

First of all, my thought is that the scanner(flex) is the same in either 
case(e.g, support all tokens in XPath 2.0), and that distinguishing the 
various "languages" is done on a higher level(parser).

Distinguishing XPath 1.0/2.0 is from what I can tell the easiest. Since XPath 
2.0 is an extension to 1.0, one can pass the parser an argument which 
signifies whether it's 1.0 that is parsed, and in the actions for 2.0 
expressions error out if 1.0 is being parsed.

In other words, conditional checks on an action basis.

This approach, however, easily becomes complex when taking the other grammars 
into account, because one needs to be "context" aware. For example, XSL-T 
Patterns is a sub-set, but the constructs that are disallowed are only done 
so in certain scenarios. Hence, if one continued with conditional tests("What 
language am I parsing?") inside actions, it would require to implement 
"non-terminal awareness".

Another approach, which seems attractive to me if it's possible, is to 
modularize the grammar on the API/file level. For example, the tokens are 
declared in one file, non-terminals grouped in files, and a separate parser 
is constructed for each language. It would be preferred if it was also 
modularized on the object level, but I guess the disadvantage wouldn't be 
that big if it wasn't. In other words, if one could "select start token 
depending on language" it would solve my problems, it seems. I don't know how 
this "bison modularization" would be done practically though.


What are people's experiences with these kind of problems? What are the 
approaches for solving them?


Cheers,

Frans

PS.

For those interested, here are the EBNF productions for what I'm talking 
about:

XPath 2.0(1.0 is merely a subset):
http://www.w3.org/TR/xpath20/#nt-bnf

XSL-T Patterns:
http://www.w3.org/TR/xslt20/#pattern-syntax

W3C XML Schema Selectors:
http://www.w3.org/TR/xmlschema-1/#coss-identity-constraint

btw, there's also an interesting document wrt to parser/scanner construction & 
XPath, "Building a Tokenizer for XPath or XQuery":
http://www.w3.org/TR/2005/WD-xquery-xpath-parsing-20050404/


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: Modularized parser files covering similar grammars

2005-06-13 Thread Akim Demaille
>>> "Frans" == Frans Englich <[EMAIL PROTECTED]> writes:

 > My wondering is how I practically should modularize the code in order to 
 > efficiently support these different languages.

In the future, I would like to have something like %import in Bison,
but currently, you'll have to put everything into a single file (or
run your own process beforehand).

 > What are people's experiences with these kind of problems? What are the 
 > approaches for solving them?

I don't know how similar/different are your different languages, but
if they share some large parts, say there are common sublanguages
covered by equal nonterminals, then the following technique might
useful.

I have two similar languages (in fact it's almost a single grammar
with two entry points).  First, in the parser I have:

%start program
%%
program:
  /* Parsing a source program.  */
  "seed-source" exp { tp.exp_ = $2; }
| /* Parsing an imported file.  */
  "seed-import" "let" decs "end"{ tp.decs_ = $3; }
;

In fact I'm looking either for `exp' or `"let" decs "end"', but I have
fake tokens seed-*.

Then in the (Flex) scanner, I have:

...
%%
%{
  /* Be ready to insert the seed. */
  if (seed)
{
  int res = seed;
  seed = 0;
  return res;
}
%}

where seed is initialized in some way to the first token you want to
send (that depends whether your parser is pure or not etc.).

There is no limitation on the number of initial tokens, i.e., the
actual number of start symbols.



___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


"Eating" comments: not with Flex but with Bison

2005-06-13 Thread Frans Englich

Hello,

In some languages there are constructs which are insignificant to the parse 
tree in the same way as white space (sometimes) is. Comments is one such 
example.

The Flex manual have an example on how to do it at the scanner level. Ã…atterns 
which matches the comments, but doesn't return tokens:

http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html#SEC11

I think I have a special scenario wrt. to comment handling: in one version of 
my language, comments are allowed while in another it is not. Hence, 
depending on version I want to flag the existence of comments as syntax 
errors, regardless of whether they are valid.

I would prefer to do this at the Bison/Parser level because it is convenient: 
I have access to various information passed to the parse function, the 
YYERROR macro, and the error function.

The problem I see if I let Flex return a COMMENT token and add a non-terminal 
in the Bison grammar to implement the checking, is how to make it play well 
with the other rules -- the token gets in the way.

What would solve my problem(AFAICT) is if I could write a 
non-terminal("Comment") that matched the COMMENT token and then simply ate 
it, such that the parser could continue to deduce the "real" tokens, as if 
the COMMENT had never existed(while the action code nevertheless did the 
check whether the comment was allowed). AFAICT, something like that must be 
done, since I can't add the COMMENT token everywhere(it can appear between 
every token).

Any ideas how to do that? (something with yyclearin..?) Or I am perhaps trying 
to solve the problem in a wrong way? (perhaps I should put the handling in 
the scanner, for example)


Also, I've ask a lot of questions -- tell if I'm asking too much, or point me 
to docs if I haven't RTFM.


Cheers,

Frans


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: "Eating" comments: not with Flex but with Bison

2005-06-13 Thread Hans Aberg

At 20:46 + 2005/06/13, Frans Englich wrote:

I think I have a special scenario wrt. to comment handling: in one version of
my language, comments are allowed while in another it is not. Hence,
depending on version I want to flag the existence of comments as syntax
errors, regardless of whether they are valid.


You could try to set a context switch in the lexer (see the Flex 
manual, "start conditions") on the comment lexing rule(s). Then this 
context switch can be turned on/off in various ways. For example, the 
lexer could be initiated (right after %% in the .l file) with 
checking a global variable doing this. This variable can be turned 
on/off from the parser.

--
  Hans Aberg


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison