Looks like you are trying to do things in Lexer that actually have to be done in parser. Try keeping the bare minimum in Lexer and move other parsing logics into Parser.
Can you post a small sample input you are trying to parse? - Indhu From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-boun...@antlr.org] On Behalf Of Paul Bouché Sent: Wednesday, February 18, 2009 5:21 AM To: Sidharth Kuruvila Cc: antlr-inter...@antlr.org Subject: Re: [antlr-interest] Lexer ambigiuoties Hi, that does not work. The problem is when I define a rule for unquoted strings like: (where komma is a delimiter): Ustring : Integer ' '+ ~('\n' | '{' | ',') | Name ' '+ ~('\n' | '{' | ',') | ~(' ' | '\n' | ',')+; The lexer will match >>3<< as an integer but >>3 << causes an error whereas before this was ok. Of course how should the lexer know that in one case blank is supposed to be a whitespace and in another case is part of the value, i.e. >>3 a<<. What I would like to write is: Ustring : ~Name | ~Integer; but this is not possible. BR, Paul Sidharth Kuruvila schrieb: Try moveing the rule for Name bellow Ipaddress. Regards, Sidharth On Wed, Feb 18, 2009 at 1:23 AM, "Paul Bouché (NSN)" <paul.bou...@nsn.com> wrote: Hi, I have a lexer which already recognizes valid tokens of different types, e.g. an integer will generate an integer token, a quoted string a string token, an ip address and ipaddress token etc. E.g: property : key '=' value; key : Name; value : Integer | String | Ipaddress; Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+ Integer : ('+'|'-')? ('0'..'9')+; Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ // simplified, actual grammar is correct max of three digits String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\'' | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"' ); WHITESPACE : ( ' ' | '\t' | '\n' )+ { skip(); } ; All works fine. Now I need to include unquoted strings with blanks. The problem is '0 ' (zero blank - without quotes of course). I cannot get the lexer to match this as an Integer as before. Basically I want a rule which says, if it is not something of the previous tokens, try if is an unquoted string. Of course an unquoted string may not have newlines. Any hints on how to archive this? I tried everything and ran several times into code too large exceptions because the actual grammar is much more complex (there are more unquoted values which are recognized by certain prefixed characters such as < 0x :: etc.). Thanks a bunch! Paul -- Paul Bouché Voice: +49 30 590080-1284 Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, Germany Sitz der Gesellschaft: München / Registered office: Munich Registergericht: München / Commercial registry: Munich, HRA 88537 WEEE-Reg.-Nr.: DE 52984304 Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen Sitz der Gesellschaft: München / Registered office: Munich Registergericht: München / Commercial registry: Munich, HRB 163416 List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- I am but a man. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address