On Thu, 2008-12-04 at 22:46 -0800, Kenny Leung wrote: > Hi All. > > I thought I would get my feet wet by writing a parser for Objective-C > type encodings. I thought it would be pretty easy for such a brief > "language", but it is turning out to be pretty difficult. > > One of the problems lies in parsing something like this: > > {vids=^vids} > > which means a struct named "vids", which is composed of void * (^v), > int, double, and short. > > After the "{", I need to interpret vids as a single token, and after > the "=", I need to interpret the characters as separate tokens. > > One of the interesting things I found was that this is legal: > > NUMBER : '0'..'9'; >
This is a lexer rule that turns a stream of characters into a token for the parser. Lexer rules start with an upper case letter. > but this is not: > > number : '0'..'9'; This is a parser rule (starts with a lower case letter. Hence you cannot use ranges because you cannot guarantee that the two separate tokens '0' and '9', which this rule auto-creates, have any meaning as a range. > > I bumped into this because I thought I'd "inline" the rule for the > name after the "{". Can someone explain this? > > Is there a way I can say, "use tokenizer rule A after the "{", but use > tokenizer rule B after the "=". No. The lexer (all the UpperCase rules) runs first and creates all the tokens, then the parser runs (all the lowerCase rules) against the pre-determined tokens. The thing that almost everyone runs in to is that the parser cannot influence the lexer as the lexer has already run. Don't use 'XXX' in your parser rules, create a token: XXX : 'XXX' and use the symbol XXX in your parser rules. For problems like the above you need a rule set something like: // Lexer POINT : '^' ; OPEQ : '='; LBRACE : '{'; RBRACE: '}'; ID : ('a'..'z'|'A'..'Z') +; WS : (' '|'\t')+ { $channel=HIDDEN; } // Parser struct : LBRACE ID OPEQ structSpec RBRACE ; strcutSpec : ( i=ID { checkIdChars($i.text); } | p=POINT { checkPointer(); )+ ; Instead of trying to get the individual characters of the type spec, just consume as a set of natural tokens, then separate everything out afterwards, or you will get into a mess. Don't try to think of hte paresr in human terms, try to think of what the easiest token set to produce is, then what this stream of tokens is going to look like in the parser. The parser should accept any syntax that is potentially valid and apply semantic checks. For instance above, any ID is accepted, then you check the character spec. This allows you to issue an error such as: "Invalid type specification character at line n, offset y", instead of "Syntax error." Make sure you read the FAQs and getting started articles on the Wiki, and if you have the money, buy the book. Inspecting the example grammars and contributed grammars is a good idea too. Jim > > AntlrWorks has been great for learning by playing. Thanks for any help! > > -Kenny > > > List: http://www.antlr.org:8080/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org:8080/mailman/options/antlr-interest/your-email-address > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
List: http://www.antlr.org:8080/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org:8080/mailman/options/antlr-interest/your-email-address