Graham Wideman wrote: > Hi Sam, > > Thanks for your comments. More below on your questions: > >> I'm curious as to why you want to sometimes consider whitespace, though. >> Is this a self-designed language, or a specification you're working from >> that makes whitespace 'sometimes' significant? >> >> You example was a function call or declaration. You can always get help >>from the lexer here if there are situations where there *must* be a >> space, and situations where there *mustn't* be a space, and nothing >> else... have tokens that include the lparen. > > Yes, I am considering the least-messy way to tackle a few of these issues in > PHP. (And the function example I gave was just a simple example, not a > problem in PHP.) > > One example that PHP has is the use of "$" as a prefix to identifiers, > sometimes. > > An ordinary variable: > > $myvar = 'hello'; > $othervar = $myvar; > > Everywhere that such a variable appears, the dollar prefix is required, and > no space is allowed. Now it's tempting to write the grammar as: > > variableName > : Dollar Identifier ... > ... > Identifier > : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')* > > This Identifier rule is good for all named things in PHP, but the parser rule > would allow whitespace between $ and Identifier, which is not accepted by the > actual PHP parser. > > So, maybe it's better to stick the "$" at the beginning of the lexer rule for > Identifier (call it DollarIdentifier or something). > > But then you get to variables that are members of a class/object. > > class C { > var $mymember = 'Hello'; > ... > } > $c = new C(); > print $c->mymember; > > Note how the declaration uses a $ prefix, but the usage does not (the only $ > is on the object variable, not the id of the member variable). But I'm > somewhat loath to handle the $ sometimes in lexer rules, and sometimes in > parser rules, as this seems apt to confuse later. (Maybe not... I haven't > assessed how messy it gets going down this path.) > > I do indeed see ways to lex/parse this more strictly, I'm just exploring for > the least messy way.
My limited experience has shown me that the more complex way usually ends up less messy in the end... I'd lex $id and id entirely separately, as they are syntactically distinct entities. $blah is always a variable, a "true" variable, and $c->member should be three tokens - a VARIABLEID ($c), a MEMBER (->) and an ID (member). If PHP requires there be no space between those tokens, then that might be a problem, but conceptually you'd parse it to a tree like ^(MEMBER VARIABLEID ID) or, filling in values, ^(MEMBER $c member) The point being that -> is a member operator. Your tree walker would see the $member and give that class a member called member, perhaps, which the MEMBER operator would fine. It's easy to trim/add a '$' from a string. -- Sam Barnett-Cormack List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---