On Mon, Feb 26, 2007 at 09:24:29AM -0600, Patrick R. Michaud wrote: : On Mon, Feb 26, 2007 at 07:09:58AM -0800, Jerry Gay wrote: : > # New Ticket Created by Jerry Gay : > # Please include the string: [perl #41623] : > # in the subject line of all future correspondence about this issue. : > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=41623 > : > : > : > pge's syntax for specifying ops to the op precedence parser should : > follow the perl 6 spec in it's op rule naming convention. that is, : > 'infix:+' : > 'circumfix:( )' : > : > should be : > infix:<+> : > circumfix:<( )> : : We should also note that with Larry's recent work : on an "official" Perl 6 grammar [1], the syntax for : defining tokens may in fact be radically changing : from what pgc is currently using.. For example, : infix:<+> appears to be written as : : token infix ( --> Additive) #+ + : { <sym: +> {*} } #= + : : and circumfix:<( )> is : : token circumfix ( --> Term) #+ ( ) : { \( <EXPR> \) { @<sym>:=<( )> } {*} } #= ( ) : : So there could be more substantial pgc formatting changes : going on here than meets the eye. (I'm still figuring out : what all of the syntax means in the Perl-6.0.0-STD.pm : file.)
What's basically going on here is the attempt to detangle the user's namespace from the grammar's namespace. The user will name operators using &infix:<+>, but if we use the same name for the parsing rule as for the eventual operator, a name collision results. &infix:<+> is just a funny name for a function that adds two things, not the name for the parse rule that parses plus. If we were to refer to &infix:<+> within the grammar, it would be the grammar's *own* definition of the plus operator, not the one we're trying to define for the user. In order to distinguish the names I originally went with a form more like this: token PARSE_circumfix:<( )> { \( <EXPR> \) } But I noticed several problems with that. First is simply the ugliness of the implied name mangling. Next is the redundancy of having to repeat the symbol. Nearly all of the rules end up parsing exactly the same prefix as the symbol name, though in the case of circumfix you see that the actual symbol comes in two pieces in the regex. Another problem is that the symbol is hardwired into the name, so you can't write a rule that parses to more than one symbol. Both of these problems go away if we simply construct the symbol from the name of the rule plus the list of $<sym> bindings. That also lets us leave out the PARSE kludge. And $<sym> bindings already automatically handle multiple bindings by generating a list, which is exactly what circumfix:<( )> wantss for its pseudo-subscript. Another problem with that form is that the precedence is unspecified. I first tried to mix in the precedence of these various operators via property or role, but then realized I needed to generate the precedence on the fly anyway for certain meta-operators, and the natural place to handle that is in the return processing from the rule. One would like to simultaneously have a declarative solution that eventually turns into a call to something we can tweak. That's what the ( --> Additive) notation gives us--that's just a return type coercion in Perl 6. And why introduce new traits when the signature is already supposed to handle return type coercion? The other stuff in the file can mostly be ignored unless you're writing a bootstrapping parser, in which case you might want to examine the #+ and #= comments and the {*} action points to preprocess the file into something your bootstrap compiler can handle, since most such processors will not be expected to handle full P6 syntax. Indeed, not even pugs can parse the file currently without the help of a preprocessor. The other big thing that's going on in Perl-6.0.0-STD.pm is that I've assumed multi dispatch semantics for rules that have the same name but can be differentiated by their longest-token prefix (and by their normal function arguments, if there are any). So when you call a rule like <circumfix> you are calling into all the rules name "circumfix", whether defined by your grammar or by some derived grammar. Deriving grammars is how the user can add things like their own circumfix macro without influencing anything outside their lexical scope. Allowing people to do multi dispatch based on what the pattern actually matches rather than an artifically generated name prevents a large class of errors, I expect. Otherwise you end up generating arrays or hashes of operators to dispatch to, and such data structures cannot be overridden piecemeal within a derived grammar without reinventing the dispatcher (badly). Anyway, that's where it stands at the moment. What you see in the latest Perl 6 grammar is the result of trying to following your basic, everyday design principles like: Don't Repeat Yourself Don't Force Artificial Name Generation Don't Decide Things Prematurely Limit Damage to the Smallest Scope Avoid Magical Action at a Distance Reduce, Reuse, Recycle... :) Larry