[Correcting typographic errors noticed after sending original. Sorry for the errors and duplicate posts. Also, in many of the double quoted strings below the backslashes should probably be escaped (i.e., "\\s" instead of "\s") but I'm leaving them alone for readability. --Pm]
* Rules are Parrot subroutines that know how to match strings. To compile a rule, one uses the "PGE::p6rule" function: .sub main .local pmc p6rule .local pmc rulesub .local pmc match load_bytecode "PGE.pbc" p6rule = find_global "PGE", "p6rule" rulesub = p6rule(":w (\w+) \:= (\S+)") match = rulesub("dog := spot") * A rule subroutine returns a "match object" containing the results of the match. In Perl 6, this object will be known as C< $/ >. * A match object returned from a successful match has the following characteristics: - true in boolean context - 1 in a numeric context (may change later with :g modifier) - the string matched in string context - .from() and .to() are offsets delimiting the string where the match was found - contains other match objects resulting from captured subpatterns or subrules in the match * A rule containing capturing parens gets additional match objects for each set of parens. Thus a rule like: $1 $2 rulesub = p6rule(":w(\w+) \:= (\S+)") captures the word characters prior to the ":=" into $/[0], and the non-space characters following the ":=" into $/[1]. (In Perl 6, the $1, $2, ... variables will be aliases to $/[0], $/[1], ... . ) rulesub = p6rule(":w(\w+) \:= (\S+)") match = rulesub(" let foo := 123 ") print match # outputs "foo := 123" $P0 = match[0] # first subpattern capture ($1) print $P0 # outputs "foo" $P0 = match[1] # second subpattern capture ($2) print $P0 # outputs "123" * If a capture is quantified with any of '+', '*', or '**{m..n}', then it generates an array of match objects for the subpattern capture instead of a single match object: rulesub = p6rule(":w(\w+) \:= (\S+ )*") match = rulesub(" foo := zip boom bah") print match # outputs "foo := zip boom bah" $P0 = match[0] # first subpattern capture ($1) print $P0 # outputs "foo" $P1 = match[1] # second subpattern array ($2) $P2 = $P1[0] # second repetition ($2[0]) print $P2 # outputs "zip " $P2 = $P1[1] # second repetition ($2[1]) print $P2 # outputs "boom " * Match objects for nested captures are nested into the surrounding capture object. Thus, given rulesub = p6rule(":w (let) ( (\w+) \:= (\S+) )") match = rulesub("let foo := 123") the outer match object contains two match objects ($/[0] and $/[1]), and the second of these contains two match objects at $/[1][0] and $/[1][1]. print match # outputs "let foo := 123" $P0 = match[0] # first subcapture ($1) print $P0 # outputs "let" $P0 = match[1] # second subcapture ($2) $P1 = $P0[0] # first nested capture ($2[0]) print $P1 # outputs "foo" $P1 = $P0[1] # second nested capture ($2[1]) print $P1 # outputs "123" * Non-capturing subpatterns don't nest match objects: rulesub = p6rule(":w (let) [ (\w+) \:= (\S+) ]") match = rulesub("let foo := 123") print match # outputs "let foo := 123" $P0 = match[0] # first subcapture ($1) print $P0 # outputs "let" $P0 = match[1] # second subcapture ($2) print $P0 # outputs "foo" $P0 = match[2] # third subcapture ($3) print $P0 # outputs "123" * To define a subrule, store its subroutine into a symbol table somewhere: rulesub = p6rule("int | double | float | char") store_global "type", rulesub rulesub = p6rule("\w+") store_global "ident", rulesub * To match a subrule, put the name of the subrule in angle brackets: rulesub = p6rule(":w<type> <ident>") match = rulesub(" int argc ") print match # outputs "int argc" * Subrule captures become named keys in the resulting match object: rulesub = p6rule(":w<type> <ident>") match = rulesub(" int argc ") print match # outputs "int argc" $P0 = match["type"] # get type subrule ($/<type>) print $P0 # outputs "int" $P0 = match["ident"] # get ident match ($/<ident>) print $P0 # outputs "argc" * Quantified subrules produce an array of match objects rulesub = p6rule(":w<type> <ident> [ , <ident>]*") (match) = rulesub(" float alpha, beta, gamma") $P0 = match["type"] # get type subrule ($/<type>) print $P0 # outputs "float" $P0 = match["ident"] # get ident subrule (array) $P1 = $P0[0] # first ident ($/<ident>[0]) print $P1 # outputs "alpha" $P1 = $P0[1] # second ident ($/<ident>[1]) print $P1 # outputs "beta" * Captures can be aliased via named aliases: rulesub = p6rule(":w $<key>:=[\w+] = $<val>:=[\S+]") (match) = rulesub(" abc = 123") $P0 = match["key"] # get "key" capture print $P0 # outputs "abc" $P0 = match["val"] # get "val" capture print $P0 # outputs "123" * Or you can use numbered aliases: rulesub = p6rule(":w $3:=[\w+] = $1:=[\S+]") (match) = rulesub(" abc = 123") $P0 = match[0] # get $1 print $P0 # outputs "123" $P0 = match[2] # get $3 print $P0 # outputs "abc" PGE provides the "dump" method for match objects to provide a data dump of the results. Here's a long example for parsing arithmetic expressions using the following grammar: rule factor { \w+ | \( <expr> \) } rule term {:w <factor> [ (\*|/) <factor> ]* } rule expr {:w <term> [ (\+|-) <term> ]* } The PIR code is .sub _main .local pmc p6rule .local pmc match load_bytecode "../../runtime/parrot/library/PGE.pbc" p6rule = find_global "PGE", "p6rule" $P0 = p6rule("\w+ | \( <expr> \)") store_global "factor", $P0 $P0 = p6rule(":w <factor> [ $<op>:=(\*|/) <factor> ]*") store_global "term", $P0 $P0 = p6rule(":w <term> [ $<op>:=(\+|-) <term> ]*") store_global "expr", $P0 $P0 = p6rule("<expr>") match = $P0("ab * (de + fg) - jk") match."dump"("$/") .end When the above is executed, the match."dump" call above produces the following output displaying the contents of the match object in $/: $/: <ab * (de + fg) - jk @ 0> 1 $/<expr>: <ab * (de + fg) - jk @ 0> 1 $/<expr><term>[0]: <ab * (de + fg) @ 0> 1 $/<expr><term>[0]<op>[0]: <* @ 3> 1 $/<expr><term>[0]<factor>[0]: <ab @ 0> 1 $/<expr><term>[0]<factor>[1]: <(de + fg) @ 5> 1 $/<expr><term>[0]<factor>[1]<expr>: <de + fg @ 6> 1 $/<expr><term>[0]<factor>[1]<expr><term>[0]: <de @ 6> 1 $/<expr><term>[0]<factor>[1]<expr><term>[0]<factor>[0]: <de @ 6> 1 $/<expr><term>[0]<factor>[1]<expr><term>[1]: <fg @ 11> 1 $/<expr><term>[0]<factor>[1]<expr><term>[1]<factor>[0]: <fg @ 11> 1 $/<expr><term>[0]<factor>[1]<expr><op>[0]: <+ @ 9> 1 $/<expr><term>[1]: <jk @ 17> 1 $/<expr><term>[1]<factor>[0]: <jk @ 17> 1 $/<expr><op>[0]: <- @ 15> 1