Graham Wideman wrote:
> Hi Sam,
> 
> Thanks for your comments. More below on your questions:
> 
>> I'm curious as to why you want to sometimes consider whitespace, though. 
>> Is this a self-designed language, or a specification you're working from 
>> that makes whitespace 'sometimes' significant?
>>
>> You example was a function call or declaration. You can always get help 
>>from the lexer here if there are situations where there *must* be a 
>> space, and situations where there *mustn't* be a space, and nothing 
>> else... have tokens that include the lparen.
> 
> Yes, I am considering the least-messy way to tackle a few of these issues in 
> PHP. (And the function example I gave was just a simple example, not a 
> problem in PHP.)
> 
> One example that PHP has is the use of "$" as a prefix to identifiers, 
> sometimes.
> 
> An ordinary variable:
> 
>     $myvar    = 'hello';
>     $othervar = $myvar;
> 
> Everywhere that such a variable appears, the dollar prefix is required, and 
> no space is allowed. Now it's tempting to write the grammar as:
> 
> variableName 
>     : Dollar Identifier ...
> ...
> Identifier
>     : ('a'..'z' | 'A'..'Z' | '_')  ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
> 
> This Identifier rule is good for all named things in PHP, but the parser rule 
> would allow whitespace between $ and Identifier, which is not accepted by the 
> actual PHP parser.  
> 
> So, maybe it's better to stick the "$" at the beginning of the lexer rule for 
> Identifier (call it DollarIdentifier or something).
> 
> But then you get to variables that are members of a class/object. 
> 
>     class C {
>         var $mymember = 'Hello';
>         ...
>     }
>     $c = new C();
>     print $c->mymember;
> 
> Note how the declaration uses a $ prefix, but the usage does not (the only $ 
> is on the object variable, not the id of the member variable).  But I'm 
> somewhat loath to handle the $ sometimes in lexer rules, and sometimes in 
> parser rules, as this seems apt to confuse later. (Maybe not... I haven't 
> assessed how messy it gets going down this path.)
> 
> I do indeed see ways to lex/parse this more strictly, I'm just exploring for 
> the least messy way.

My limited experience has shown me that the more complex way usually 
ends up less messy in the end...

I'd lex $id and id entirely separately, as they are syntactically 
distinct entities. $blah is always a variable, a "true" variable, and 
$c->member should be three tokens - a VARIABLEID ($c), a MEMBER (->) and 
an ID (member). If PHP requires there be no space between those tokens, 
then that might be a problem, but conceptually you'd parse it to a tree like

^(MEMBER VARIABLEID ID)

or, filling in values,

^(MEMBER $c member)

The point being that -> is a member operator. Your tree walker would see 
the $member and give that class a member called member, perhaps, which 
the MEMBER operator would fine. It's easy to trim/add a '$' from a string.

-- 
Sam Barnett-Cormack

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-interest@googlegroups.com
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to