Looks like you are trying to do things in Lexer that actually have to be
done in parser. Try keeping the bare minimum in Lexer and move other parsing
logics into Parser. 

Can you post a small sample input you are trying to parse?


- Indhu


that does not work. The problem is when I define a rule for unquoted strings
like: (where komma is a delimiter):

Ustring : Integer ' '+ ~('\n' | '{' | ',') |  Name ' '+ ~('\n' | '{' | ',')
| ~(' ' | '\n' | ',')+;

The lexer will match >>3<< as an integer but >>3 << causes an error whereas
before this was ok. Of course how should the lexer know that in one case
blank is supposed to be a whitespace and in another case is part of the
value, i.e. >>3 a<<.

What I would like to write is:

Ustring : ~Name | ~Integer;

but this is not possible.


Sidharth Kuruvila schrieb: 

Try moveing the rule for Name bellow Ipaddress.


On Wed, Feb 18, 2009 at 1:23 AM, "Paul Bouché (NSN)" <paul.bou...@nsn.com>


I have a lexer which already recognizes valid tokens of different types,
e.g. an integer will generate an integer token, a quoted string a string
token, an ip address and ipaddress token etc.

property : key '=' value;
key : Name;
value : Integer | String | Ipaddress;
Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+
Integer : ('+'|'-')? ('0'..'9')+;
Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+
// simplified, actual grammar is correct max of three digits
String :  ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\''
        | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"'
      ( ' ' | '\t' | '\n' )+
      { skip(); }

All works fine. Now I need to include unquoted strings with blanks. The
problem is '0 ' (zero blank - without quotes of course). I cannot get
the lexer to match this as an Integer as before. Basically I want a rule
which says, if it is not something of the previous tokens, try if is an
unquoted string. Of course an unquoted string may not have newlines.
Any hints on how to archive this?
I tried everything and ran several times into code too large exceptions
because the actual grammar is much more complex (there are more unquoted
values which are recognized by certain prefixed characters such as < 0x
:: etc.).

Thanks a bunch!

