PGE features update (corrections)

Patrick R. Michaud Sun, 08 May 2005 10:39:17 -0700

[Correcting typographic errors noticed after sending original.  Sorry
for the errors and duplicate posts.  Also, in many of the double
quoted strings below the backslashes should probably be escaped
(i.e., "\\s" instead of "\s") but I'm leaving them alone for 
readability.  --Pm]


* Rules are Parrot subroutines that know how to match strings.  To
  compile a rule, one uses the "PGE::p6rule" function:

    .sub main
        .local pmc p6rule
        .local pmc rulesub
        .local pmc match
        load_bytecode "PGE.pbc"
        p6rule = find_global "PGE", "p6rule"

        rulesub = p6rule(":w (\w+) \:= (\S+)")
        match = rulesub("dog := spot")
 
* A rule subroutine returns a "match object" containing the
  results of the match.  In Perl 6, this object will be known as C< $/ >.

* A match object returned from a successful match has the following
  characteristics:
  - true in boolean context
  - 1 in a numeric context (may change later with :g modifier)
  - the string matched in string context
  - .from() and .to() are offsets delimiting the string where 
    the match was found
  - contains other match objects resulting from captured
    subpatterns or subrules in the match

* A rule containing capturing parens gets additional match objects
  for each set of parens.  Thus a rule like:

                             $1        $2
        rulesub = p6rule(":w(\w+) \:= (\S+)")

  captures the word characters prior to the ":=" into $/[0], and 
  the non-space characters following the ":=" into $/[1].  (In Perl 6,
  the $1, $2, ... variables will be aliases to $/[0], $/[1], ... . )

        rulesub = p6rule(":w(\w+) \:= (\S+)")
        match = rulesub(" let foo := 123 ")
        print match                        # outputs "foo := 123"
        $P0 = match[0]                     # first subpattern capture ($1)
        print $P0                          # outputs "foo"
        $P0 = match[1]                     # second subpattern capture ($2)
        print $P0                          # outputs "123"

* If a capture is quantified with any of '+', '*', or '**{m..n}',
  then it generates an array of match objects for the subpattern capture
  instead of a single match object:

        rulesub = p6rule(":w(\w+) \:= (\S+ )*")
        match = rulesub(" foo := zip boom bah")
        print match                        # outputs "foo := zip boom bah"
        $P0 = match[0]                     # first subpattern capture ($1)
        print $P0                          # outputs "foo"
        $P1 = match[1]                     # second subpattern array ($2)
        $P2 = $P1[0]                       # second repetition ($2[0])
        print $P2                          # outputs "zip "
        $P2 = $P1[1]                       # second repetition ($2[1])
        print $P2                          # outputs "boom "

* Match objects for nested captures are nested into the surrounding
  capture object.  Thus, given

        rulesub = p6rule(":w (let) ( (\w+) \:= (\S+) )")
        match = rulesub("let foo := 123")

  the outer match object contains two match objects ($/[0] and $/[1]),
  and the second of these contains two match objects at
  $/[1][0] and $/[1][1].

        print match                        # outputs "let foo := 123"
        $P0 = match[0]                     # first subcapture ($1)
        print $P0                          # outputs "let"
        $P0 = match[1]                     # second subcapture ($2)
        $P1 = $P0[0]                       # first nested capture ($2[0])
        print $P1                          # outputs "foo"
        $P1 = $P0[1]                       # second nested capture ($2[1])
        print $P1                          # outputs "123"

* Non-capturing subpatterns don't nest match objects:

        rulesub = p6rule(":w (let) [ (\w+) \:= (\S+) ]")
        match = rulesub("let foo := 123")
        print match                        # outputs "let foo := 123"
        $P0 = match[0]                     # first subcapture ($1)
        print $P0                          # outputs "let"
        $P0 = match[1]                     # second subcapture ($2)
        print $P0                          # outputs "foo"
        $P0 = match[2]                     # third subcapture ($3)
        print $P0                          # outputs "123"

* To define a subrule, store its subroutine into a symbol table somewhere:

        rulesub = p6rule("int | double | float | char")
        store_global "type", rulesub 
        rulesub = p6rule("\w+")
        store_global "ident", rulesub

* To match a subrule, put the name of the subrule in angle brackets:

        rulesub = p6rule(":w<type> <ident>")
        match = rulesub("   int argc ")
        print match                        # outputs "int argc"

* Subrule captures become named keys in the resulting match object:

        rulesub = p6rule(":w<type> <ident>")
        match = rulesub("   int argc ")
        print match                        # outputs "int argc"
        $P0 = match["type"]                # get type subrule  ($/<type>)
        print $P0                          # outputs "int"
        $P0 = match["ident"]               # get ident match ($/<ident>)
        print $P0                          # outputs "argc" 

* Quantified subrules produce an array of match objects

        rulesub = p6rule(":w<type> <ident> [ , <ident>]*")
        (match) = rulesub("    float alpha, beta, gamma")
        $P0 = match["type"]                # get type subrule ($/<type>)
        print $P0                          # outputs "float"
        $P0 = match["ident"]               # get ident subrule (array)
        $P1 = $P0[0]                       # first ident ($/<ident>[0])
        print $P1                          # outputs "alpha"
        $P1 = $P0[1]                       # second ident ($/<ident>[1])
        print $P1                          # outputs "beta"

* Captures can be aliased via named aliases:

        rulesub = p6rule(":w $<key>:=[\w+] = $<val>:=[\S+]")
        (match) = rulesub("   abc = 123")
        $P0 = match["key"]                 # get "key" capture
        print $P0                          # outputs "abc"
        $P0 = match["val"]                 # get "val" capture
        print $P0                          # outputs "123"

* Or you can use numbered aliases:

        rulesub = p6rule(":w $3:=[\w+] = $1:=[\S+]")
        (match) = rulesub("   abc = 123")
        $P0 = match[0]                     # get $1
        print $P0                          # outputs "123"
        $P0 = match[2]                     # get $3
        print $P0                          # outputs "abc"

PGE provides the "dump" method for match objects to provide
a data dump of the results.  Here's a long example for
parsing arithmetic expressions using the following grammar:

    rule factor { \w+ | \( <expr> \) }
    rule term   {:w <factor> [ (\*|/) <factor> ]* }
    rule expr   {:w <term> [ (\+|-) <term> ]* }

The PIR code is

    .sub _main
        .local pmc p6rule
        .local pmc match
    
        load_bytecode "../../runtime/parrot/library/PGE.pbc"
        p6rule = find_global "PGE", "p6rule"
    
        $P0 = p6rule("\w+ | \( <expr> \)")
        store_global "factor", $P0
    
        $P0 = p6rule(":w <factor> [ $<op>:=(\*|/) <factor> ]*")
        store_global "term", $P0
    
        $P0 = p6rule(":w <term> [ $<op>:=(\+|-) <term> ]*")
        store_global "expr", $P0
    
        $P0 = p6rule("<expr>")
        match = $P0("ab * (de + fg) - jk")
        match."dump"("$/")
    .end

When the above is executed, the match."dump" call above 
produces the following output displaying the contents of
the match object in $/:

    $/: <ab * (de + fg) - jk @ 0> 1
    $/<expr>: <ab * (de + fg) - jk @ 0> 1
    $/<expr><term>[0]: <ab * (de + fg)  @ 0> 1
    $/<expr><term>[0]<op>[0]: <* @ 3> 1
    $/<expr><term>[0]<factor>[0]: <ab @ 0> 1
    $/<expr><term>[0]<factor>[1]: <(de + fg) @ 5> 1
    $/<expr><term>[0]<factor>[1]<expr>: <de + fg @ 6> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[0]: <de  @ 6> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[0]<factor>[0]: <de @ 6> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[1]: <fg @ 11> 1
    $/<expr><term>[0]<factor>[1]<expr><term>[1]<factor>[0]: <fg @ 11> 1
    $/<expr><term>[0]<factor>[1]<expr><op>[0]: <+ @ 9> 1
    $/<expr><term>[1]: <jk @ 17> 1
    $/<expr><term>[1]<factor>[0]: <jk @ 17> 1
    $/<expr><op>[0]: <- @ 15> 1

PGE features update (corrections)

Reply via email to