Re: comprehensive list of perl6 rule tokens

Jeff 'japhy' Pinyan Thu, 26 May 2005 16:09:16 -0700

On May 26, Patrick R. Michaud said:

On Tue, May 24, 2005 at 08:25:03PM -0400, Jeff 'japhy' Pinyan wrote:

I have looked through the latest
revisions of Apo05 and Syn05 (from Dec 2004) and come up with the
following list:


  http://japhy.perlmonk.org/perl6/rules.txt


I'll review the list below, but it's also worthwhile to read

  http://www.nntp.perl.org/group/perl.perl6.language/21120

which is Larry's latest missive on character classes, and

  http://www.nntp.perl.org/group/perl.perl6.language/20985

which describes the capturing semantics (but be sure to note
the lengthy threads that follow concerning changes in the
indexing from $1, $2, ... to $0, $1, ... ).

I'll check them out. Right now, I'm really only concerned with syntaxrather than implementation. Perl6::Rule::Parser will only parse the ruleinto a tree structure.

        &   a&b         N       conjunction
                &var                N       subroutine

I'm not sure that "&var" means subroutine anymore.  A05 does mention


Ok.  If it goes away, I'm fine with that.

        x**{n..m}       N       previous atom n..m times

Keeping in mind that the "n..m" can actually be any sort of closure


Yeah, I know.

        (       (x)             Y       capture 'x'
        )                       Y       must match opening '('

It may be worth noting that parens not only capture, they also
introduce a new scope for any nested subpattern and subrule captures.


Ok.  I don't think that'll affects me right now.

        :ignorecase     N       case insensitivity :i
        :global         N       match globally :g
        :continue       N       start scanning after previous match :c
       ...etc

I'm not sure these are "tokens" in the sense of "single unit of purpose"
in your original message.  I think these are all adverbs, and the "token"
is just the initial C<:> at the beginning of a group.

I understand, but that set is particularly important to me, because as faras I am concerned, the rule


  /abc/

is the object Perl6::Rule::Parser::exact->new('abc'), whereas the rule

  /:i abc/

is the object Perl6::Rule::Parser::exactf->new('abc') -- this is usingnode terminology from Perl 5, where "exactf" means "exact with casefolding".

        :keepall        N       all rules and invoked rules remember everything

That's now  ":parsetree" according to Damian's proposed capture rules.


Ok.  I haven't seen those yet.

        <commit>  N       backtracking fails completely
        <cut>             N       remove what matched up to this point from the 
string
        <after P> N       we must be after the pattern P
        <!after P>        N       we must NOT be after the pattern P
        <before P>        N       we must be before the pattern P
        <!before P>       N       we must NOT be before the pattern P

As with ':words', etc., I'm not sure that these qualify as "tokens"
when parsing the regex -- the tokens are actually "<" or "<!" and

I understand. Luckily this new syntax will enable me to abstract thingsin the parser.


  my $obj = $S->object(assertion => $name, $neg);
  # where $name is the part after the < or <!
  # and $neg is a boolean denoting the presence of !

Since there's no longer different prefixes for every type of assertion, Ino longer need to make specific classes of objects.

       <?ws>              N       match whitespace by :w rules
        <?sp>             N       match a space character (chr 32 ONLY)

Here the token is "<?", indicating a non-capturing subrule.


Right.

        <$rule>           N       indirect rule
        <::$rulename>     N       indirect symbolic rule
        <@rules>  N       like '@rules'
        <%rules>  N       like '%rules'
        <{ code }>        N       code produces a rule
        <&foo()>      N       subroutine returns rule
        <( code )>        N       code must return true or backtracking ensues

Here the leading tokens are actually "<$", "<::$", "<@", "<%", "<{", "<&",
and "<(", and I suspect we have "<?$", "<?::$", "<?@", and "<!$", "<!::$",
"<!@", etc. counterparts.


Per your second message, <[EMAIL PROTECTED]> would mean <!before <@rules>>, 
right?

                           Of course, one could claim that these are
really separated as in "<", "?", and "$" tokens, but PGE's parser currently
treats them as a unit to make it easier to jump directly into the correct
handler for what follows.


Yes, so does mine. :)

        <[a-z]>           N       character class
        <+alpha>  N       character class
        <-[a-z]>  N       complemented character class

The tokens for character class manipulation are currently "<[", "<+",
and "<-", although that's not officially documented in A05 or S05 yet.
Also, ranges are now <[a..z]> -- an unescaped hyphen appearing in an
enumerated character class generates a warning.

        <+\w-[0-9]>       N       character class "arithmetic"

I'm not sure that it's been decided/documented that \w, \s, etc.
can appear in character class arithmetic (although it seems like it
should).

The new character class idiom is going to confuse me for a while. I'llhave to read the above URL in which Larry sheds light.

        <prop:X>  N       Unicode property match
        <-prop:X> N       complemented Unicode property match

Here "prop" is just a subrule (or character class) similar to
<+alpha>, <+digit>, etc.  Also, note that <prop:X> is a capturing
subrule, while <+prop:X> would be a character class match (and presumably
not capture).

I think I'll wait to handle Unicode properties until a syntax has beenagreed upon... <prop:X>, <X>, <prop(X)>, etc.

        <rule>            N       match rule (and capture to $rule)
        <?rule>           N       match rule (don't capture)
        <<rule>>    N       match rule (don't capture)

Do we still have the <<rule>> syntax, or was that abandoned in
favor of <?rule> ?  (I know there are still some remnants of <<...>>
in S05 and A05, but I'm not sure they're intentional.)

I saw <<...>> in A/S 05, but if they're accidental, then I just won't dealwith it.

And, what's the deal with <RULE> capturing? Does that mean I have towrite <?digit> everywhere instead of <digit> unless I want a capture? Eh,I guess \d exists for that reason...

Thanks for your help.  Unless you're difficult.


   "You're welcome"  unless $Pm ~~ /<?difficult>/;


Difficulty nonexistent.

--
Jeff "japhy" Pinyan         %  How can we ever be the sold short or
RPI Acacia Brother #734     %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %    -- Meister Eckhart

Re: comprehensive list of perl6 rule tokens

Reply via email to