On May 26, Patrick R. Michaud said:
On Tue, May 24, 2005 at 08:25:03PM -0400, Jeff 'japhy' Pinyan wrote:
I have looked through the latest
revisions of Apo05 and Syn05 (from Dec 2004) and come up with the
following list:
http://japhy.perlmonk.org/perl6/rules.txt
I'll review the list below, but it's also worthwhile to read
http://www.nntp.perl.org/group/perl.perl6.language/21120
which is Larry's latest missive on character classes, and
http://www.nntp.perl.org/group/perl.perl6.language/20985
which describes the capturing semantics (but be sure to note
the lengthy threads that follow concerning changes in the
indexing from $1, $2, ... to $0, $1, ... ).
I'll check them out. Right now, I'm really only concerned with syntax
rather than implementation. Perl6::Rule::Parser will only parse the rule
into a tree structure.
& a&b N conjunction
&var N subroutine
I'm not sure that "&var" means subroutine anymore. A05 does mention
Ok. If it goes away, I'm fine with that.
x**{n..m} N previous atom n..m times
Keeping in mind that the "n..m" can actually be any sort of closure
Yeah, I know.
( (x) Y capture 'x'
) Y must match opening '('
It may be worth noting that parens not only capture, they also
introduce a new scope for any nested subpattern and subrule captures.
Ok. I don't think that'll affects me right now.
:ignorecase N case insensitivity :i
:global N match globally :g
:continue N start scanning after previous match :c
...etc
I'm not sure these are "tokens" in the sense of "single unit of purpose"
in your original message. I think these are all adverbs, and the "token"
is just the initial C<:> at the beginning of a group.
I understand, but that set is particularly important to me, because as far
as I am concerned, the rule
/abc/
is the object Perl6::Rule::Parser::exact->new('abc'), whereas the rule
/:i abc/
is the object Perl6::Rule::Parser::exactf->new('abc') -- this is using
node terminology from Perl 5, where "exactf" means "exact with case
folding".
:keepall N all rules and invoked rules remember everything
That's now ":parsetree" according to Damian's proposed capture rules.
Ok. I haven't seen those yet.
<commit> N backtracking fails completely
<cut> N remove what matched up to this point from the
string
<after P> N we must be after the pattern P
<!after P> N we must NOT be after the pattern P
<before P> N we must be before the pattern P
<!before P> N we must NOT be before the pattern P
As with ':words', etc., I'm not sure that these qualify as "tokens"
when parsing the regex -- the tokens are actually "<" or "<!" and
I understand. Luckily this new syntax will enable me to abstract things
in the parser.
my $obj = $S->object(assertion => $name, $neg);
# where $name is the part after the < or <!
# and $neg is a boolean denoting the presence of !
Since there's no longer different prefixes for every type of assertion, I
no longer need to make specific classes of objects.
<?ws> N match whitespace by :w rules
<?sp> N match a space character (chr 32 ONLY)
Here the token is "<?", indicating a non-capturing subrule.
Right.
<$rule> N indirect rule
<::$rulename> N indirect symbolic rule
<@rules> N like '@rules'
<%rules> N like '%rules'
<{ code }> N code produces a rule
<&foo()> N subroutine returns rule
<( code )> N code must return true or backtracking ensues
Here the leading tokens are actually "<$", "<::$", "<@", "<%", "<{", "<&",
and "<(", and I suspect we have "<?$", "<?::$", "<?@", and "<!$", "<!::$",
"<!@", etc. counterparts.
Per your second message, <[EMAIL PROTECTED]> would mean <!before <@rules>>,
right?
Of course, one could claim that these are
really separated as in "<", "?", and "$" tokens, but PGE's parser currently
treats them as a unit to make it easier to jump directly into the correct
handler for what follows.
Yes, so does mine. :)
<[a-z]> N character class
<+alpha> N character class
<-[a-z]> N complemented character class
The tokens for character class manipulation are currently "<[", "<+",
and "<-", although that's not officially documented in A05 or S05 yet.
Also, ranges are now <[a..z]> -- an unescaped hyphen appearing in an
enumerated character class generates a warning.
<+\w-[0-9]> N character class "arithmetic"
I'm not sure that it's been decided/documented that \w, \s, etc.
can appear in character class arithmetic (although it seems like it
should).
The new character class idiom is going to confuse me for a while. I'll
have to read the above URL in which Larry sheds light.
<prop:X> N Unicode property match
<-prop:X> N complemented Unicode property match
Here "prop" is just a subrule (or character class) similar to
<+alpha>, <+digit>, etc. Also, note that <prop:X> is a capturing
subrule, while <+prop:X> would be a character class match (and presumably
not capture).
I think I'll wait to handle Unicode properties until a syntax has been
agreed upon... <prop:X>, <X>, <prop(X)>, etc.
<rule> N match rule (and capture to $rule)
<?rule> N match rule (don't capture)
<<rule>> N match rule (don't capture)
Do we still have the <<rule>> syntax, or was that abandoned in
favor of <?rule> ? (I know there are still some remnants of <<...>>
in S05 and A05, but I'm not sure they're intentional.)
I saw <<...>> in A/S 05, but if they're accidental, then I just won't deal
with it.
And, what's the deal with <RULE> capturing? Does that mean I have to
write <?digit> everywhere instead of <digit> unless I want a capture? Eh,
I guess \d exists for that reason...
Thanks for your help. Unless you're difficult.
"You're welcome" unless $Pm ~~ /<?difficult>/;
Difficulty nonexistent.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart