[Note that the version I posted using NOTNEWLINE* solves the original poster's problem (as would appending '\n' to the stream before lexing, although that requires writing extra code). The discussion below is just about details of why other approaches using .* don't work.]
Gavin Lambert wrote: > At 09:05 19/08/2009, consili...@gmail.com wrote: > >For testing I removed the .* and, while there are no errors, it > >still doesn't match b. as the token MC_INCORRECT unless there > >is a newline after it. > [...] > >MC_QUESTION : INT ('.'|')') ENDOFLINE; > >MC_INCORRECT : LETTER '.' ENDOFLINE; > >MC_CORRECT : '*' MC_INCORRECT; > > > >fragment ENDOFLINE : NEWLINE | { input.LA(1) == EOF }?; > > Are you using the debugger or the interpreter to test with? The > interpreter doesn't execute predicates, so it won't work properly; > you need to use the debugger. Right. > It also might pay to try a few variations on the ENDOFLINE rule; > sometimes ANTLR seems to ignore predicates if it thinks that > they're not accomplishing anything. Try this, for example: > > fragment ENDOFLINE : { input.LA(1) == EOF }? => | NEWLINE ; > > or this: > > fragment ENDOFLINE : NEWLINE | EOF ; ENDOFLINE can indeed be simplified to NEWLINE | EOF. However, that won't help because it is not the predicate that causes the problem here; it's the fact that the match immediately following .* uses the '|' operator. Note that it doesn't matter whether this match is "inlined" or in a separate fragment rule (and it also doesn't matter whether (option { greedy=false; } : .)* is used instead of .*). For instance, this version of MC_QUESTION still produces the warning: MC_QUESTION : INT ('.'|')') (options { greedy=false; } : .)* ('\r'? '\n' | EOF); Either of these work, and do not warn (but do not accept end-of-file): MC_QUESTION : INT ('.'|')') .* NEWLINE; or MC_QUESTION : INT ('.'|')') .* ('\r'? '\n'); but this produces the warning: MC_QUESTION : INT ('.'|')') .* ('\r' '\n' | '\n'); even though you would normally expect ('\r'? '\n') to be equivalent to ('\r' '\n' | '\n'). Therefore, .* can't be used in cases where the match following it necessarily involves an alternation that can't be expressed using '?'. -- David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---