> If you want the longest match, then left factor everything and let it > do that: > > A ( B (C|) |) ; > > And set the token type at the appropriate points.
Not always so easy, however. My original example was, even more simplified, something like this: FOO: 'foo'; BAR: 'bar'; FOOZ: 'foo'* 'z'; It might be possible to refactor using emit() or something, I'm not sure. Difficult, anyway. An alternative would be to force backtracking using syntactic predicates in the manner Indhu suggested in a previous reply, but that means the lexer would scan the same input more than once, and avoiding this is sort of why I use a lexer generator tool instead of just matching the input with regexps to start with. By the way, I got around my own problem with the URL/IDENT conflict by incorporating the URL in the larger context where it appears, getting a larger token from the lexer which is split up later. This seemed to be the most bearable inelegancy in my situation. J' List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---