>...snipped.... >> ...... but I still have no solution to my > problem: how can I make the variable in my label rule be anything? That > is, I would think anything except whitespace and braces and control > characters would be fine. In particular, it definitely has to accept > any word in any script, along with some punctuation characters such as . > - _ $ and probably more.
The best solution is to redefine your language such a LABEL is a quoted string or something similiar that the Lexer can identify. But you probably don't have control over your language definition, otherwise you would have done the redefinition already and moved on to more important stuff. So anyway.... An alternative - that has lots of problems - I hesitate to mention it - is too make LABEL a parser rule, like this (just 1 way, other ways are possible, probably involving syntax predicates): // label formulas label : labelHead VARIABLE labelTail ; labelHead : FUNCTION | CATEGORY | WORD | LEMMA | MORPHOLOGY ; labelTail : (~CLOSE)+ ; Now this also means that a Lexer rule such as: ANY : . ; must be added as the VERY LAST rule in the Lexer. This ensures that any character not recognized as a token by the other Lexer rules gets identified as an ANY token - note that ANY is intentionally not `.+` as that would trip across the greedy nature of Antlr's Lexing strategy and consume all characters. ANY could be tweeked to exclude control characters and perhaps other charcters. So anyway... Under the above Parser rules a labelTail will match any non-empty sequence of tokens upto but not including a CLOSE token. but it will also match whitespace and comments - since those tokens are on the HIDDEN channel and not seen by the parser. Not sure if that is what you want, it is, I believe, the same functionality as your original LABEL : ~(')')+; rule which also happily ate whitespace and comments..... Another bad part of the above labelTail rule is that it is now a list of tokens rather than a single token. So whatever processing you are performing upon the parsed result - eventually and AST perhaps? - will be much more complicated. And further the list of tokens may seem goofy in that your original example of test input: "(word x Einführung)" the labelTail will be a list of 3 tokens: VARIABLE, ANY, VARIABLE. That is, it will be the tokens for "Einf" a VARIABLE, "ü" an ANY, and "hrung" a VARIABLE. I do not speak your language so I am only marginally bothered by this, but your mileage may vary. Hope this helps... -jbb --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
List: http://www.antlr.org:8080/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org:8080/mailman/options/antlr-interest/your-email-address