I actually refer to the way how ANTLR decides which token has to be generated next. The simplest case would be that one has a NUMBER rule, a DOT rule and a FLOGTING_POINT rule. With the input "1." ANTLR could theoritically create a NUMBER token followed by a DOT token, but just tries to match FLOATING_POINT, which fails.
Curious... Why not in such cases, backtrack and return NUMBER? ----- Original Message ----- From: Johannes Luber <jalu...@gmx.de> To: paul bouche <paul.bou...@apertio.com> Cc: antlr-inter...@antlr.org Sent: Wednesday, February 18, 2009 7:33:20 PM GMT+0530 Asia/Calcutta Subject: Re: [antlr-interest] Lexer ambigiuoties > Johannes Luber schrieb: > > The deeper problem lies in the fact that ANTLR uses an insufficent > algorithm to sort out - for humans - non-ambiguous input in all cases > correctly. > From the book I glean that LL(*) does cover all context free languages. > Those for humans non ambiguous but for computers ambiguous cases are > only non ambiguous to humans because they have context? Because a blank > or any other character for that matter may be interpreted as white space > in one case it shall be interpreted differently in another case. The > difference between those cases is context, i.e. what came before and > what the next k-ahead symbol is. > > Or could you be more concrete by what you mean with "uses an insufficent > algorithm" - ah I just thought that the parser is LL(*) but the lexer > uses a cyclic DFA for prediction which may not cover all context free > languages and certainly not context-sensitive. I actually refer to the way how ANTLR decides which token has to be generated next. The simplest case would be that one has a NUMBER rule, a DOT rule and a FLOGTING_POINT rule. With the input "1." ANTLR could theoritically create a NUMBER token followed by a DOT token, but just tries to match FLOATING_POINT, which fails. Johannes > > BR, > Paul > > Paul > > Not sure if changing the algorithm would help here, too, but it would > at least simplify the common cases. Unfortunately, it isn't clear when Ter > does finally do a rewrite here. > > > > Johannes > > > >> Johannes Luber schrieb: > >> > >>> Paul Bouché (NSN) schrieb: > >>> > >>> > >>>> Hi, > >>>> > >>>> I have a lexer which already recognizes valid tokens of different > >>>> > >> types, > >> > >>>> e.g. an integer will generate an integer token, a quoted string a > >>>> > >> string > >> > >>>> token, an ip address and ipaddress token etc. > >>>> E.g: > >>>> > >>>> property : key '=' value; > >>>> key : Name; > >>>> value : Integer | String | Ipaddress; > >>>> Name : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '-' | ':' | '%')+ > >>>> Integer : ('+'|'-')? ('0'..'9')+; > >>>> Ipaddress : ('0'..'9')+ '.' ('0'..'9')+ '.' ('0'..'9')+ '.' > ('0'..'9')+ > >>>> // simplified, actual grammar is correct max of three digits > >>>> String : ( '\'' ( STRING_ | '`' | '"' | '\\' '\'' )* '\'' > >>>> | '"' ( STRING_ | '`' | '\'' | '\\' '"' )* '"' > >>>> ); > >>>> WHITESPACE > >>>> : > >>>> ( ' ' | '\t' | '\n' )+ > >>>> { skip(); } > >>>> ; > >>>> > >>>> All works fine. Now I need to include unquoted strings with blanks. > The > >>>> problem is '0 ' (zero blank - without quotes of course). I cannot get > >>>> the lexer to match this as an Integer as before. Basically I want a > >>>> > >> rule > >> > >>>> which says, if it is not something of the previous tokens, try if is > an > >>>> unquoted string. Of course an unquoted string may not have newlines. > >>>> Any hints on how to archive this? > >>>> I tried everything and ran several times into code too large > exceptions > >>>> because the actual grammar is much more complex (there are more > >>>> > >> unquoted > >> > >>>> values which are recognized by certain prefixed characters such as < > 0x > >>>> :: etc.). > >>>> > >>>> Thanks a bunch! > >>>> Paul > >>>> > >>>> > >>>> > >>> Try to set the appropriate type later like it is done here: > >>> > >>> > >> > <http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs> > > >> > >>> Johannes > >>> > >>> > > > > > > > -- > Paul Bouché > Voice: +49 30 590080-1284 > > Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, > Germany > Sitz der Gesellschaft: München / Registered office: Munich > Registergericht: München / Commercial registry: Munich, HRA 88537 > WEEE-Reg.-Nr.: DE 52984304 > > Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens > Networks Management GmbH > Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke > Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri > Kivinen > Sitz der Gesellschaft: München / Registered office: Munich > Registergericht: München / Commercial registry: Munich, HRB 163416 > -- Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address