Author: larry
Date: Tue Sep 11 11:54:28 2007
New Revision: 14454
Modified:
doc/trunk/design/syn/S05.pod
Log:
Last (we hope) major revision of regex syntax.
Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podTue Sep 11 11:54:28 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 6 Sep 2007
+ Last Modified: 11 Sep 2007
Number: 5
- Version: 64
+ Version: 65
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I rather than "regular
@@ -36,14 +36,18 @@
=head1 New match result and capture variables
The underlying match result object is now available as the C<$/>
-variable, which is implicitly lexically scoped. All access to the
-current (or most recent) match is through this variable, even when
+variable, which is implicitly lexically scoped. All user access to the
+most recent match is through this variable, even when
it doesn't look like it. The individual capture variables (such as C<$0>,
C<$1>, etc.) are just elements of C<$/>.
By the way, unlike in Perl 5, the numbered capture variables now
start at C<$0> instead of C<$1>. See below.
+During the execution of a match, the current match state is stored in a
+C<$_> variable lexically scoped to an appropriate portion of the match.
+This is transparent to the user for simple matches.
+
=head1 Unchanged syntactic features
The following regex features use the same syntax as in Perl 5:
@@ -75,9 +79,11 @@
While the syntax of C<|> does not change, the default semantics do
change slightly. We are attempting to concoct a pleasing mixture
of declarative and procedural matching so that we can have the
-best of both. See the section below on "Longest-token matching".
+best of both. In short, you need not write your own tokener for
+a grammar because Perl will write one for you. See the section
+below on "Longest-token matching".
-=head1 Simplified lexical parsing
+=head1 Simplified lexical parsing of patterns
Unlike traditional regular expressions, Perl 6 does not require
you to memorize an arbitrary list of metacharacters. Instead it
@@ -202,58 +208,49 @@
=item *
The C<:c> (or C<:continue>) modifier causes the pattern to continue
-scanning from the string's current C<.pos>:
+scanning from the specified position (defaulting to C<$/.to>):
- m:c/ pattern /# start at end of
- # previous match on $_
+ m:c($p)/ pattern / # start scanning at position $p
Note that this does not automatically anchor the pattern to the starting
location. (Use C<:p> for that.) The pattern you supply to C
has an implicit C<:c> modifier.
-The C<:continue> modifier takes an optional argument of type C
-which specifies the point at which to start scanning for a match.
-This should not be used unless you know what you're doing, or just
-happen to like hard-to-debug infinite loops.
+String positions are of type C and should generally be treated
+as opaque.
=item *
The C<:p> (or C<:pos>) modifier causes the pattern to try to match only at
-the string's current C<.pos>:
+the specified string position:
- m:p/ pattern /# match at end of
- # previous match on $_
+ m:pos($p)/ pattern / # match at position $p
-Since this is implicitly anchored to the position, it's suitable for
-building parsers and lexers. The pattern you supply to a Perl macro's
-C trait has an implicit C<:p> modifier.
+If the argument is omitted, it defaults to C<$/.to>. (Unlike in
+Perl 5, the string itself has no clue where its last match ended.)
+All subrule matches are implicitly passed their starting position.
+Likewise, the pattern you supply to a Perl macro's C
+trait has an implicit C<:p> modifier.
Note that
- m:c/pattern/
+ m:c($p)/pattern/
is roughly equivalent to
- m:p/.*? <( pattern )> /
-
-Also note that any regex called as a subrule is implicitly anchored to the
-current position anyway.
-
-The C<:pos> modifier takes an optional argument of type C
-which specifies the point at which to attempt a match. This should not
-be used lightly. Put it in the category of a "goto".
+ m:p($p)/.*? <( pattern )> /
=item *
The new C<:s> (C<:sigspace>) modifier causes whitespace sequences
to be considered "significant"; they are replaced by a whitespace
-matching rule, C<< <+ws> >>. That is,
+matching rule, C<< <.ws> >>. That is,
m:s/ next cmd = /
is the same as:
- m/ <+ws> next <+ws> cmd <+ws> = <+ws> /
+ m/ <.ws> next <.ws> cmd <.ws> = <.ws> /
which is effectively the same as:
@@ -265,9 +262,9 @@
or equivalently,
- m { (a|\*) <+ws> (b|\+) }
+ m { (a|\*) <.ws> (b|\+) }
-C<< <+ws>