[svn:perl6-synopsis] r14454 - doc/trunk/design/syn

2007-09-11 Thread larry
Author: larry
Date: Tue Sep 11 11:54:28 2007
New Revision: 14454

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Last (we hope) major revision of regex syntax.


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podTue Sep 11 11:54:28 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
-   Last Modified: 6 Sep 2007
+   Last Modified: 11 Sep 2007
Number: 5
-   Version: 64
+   Version: 65
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I rather than "regular
@@ -36,14 +36,18 @@
 =head1 New match result and capture variables
 
 The underlying match result object is now available as the C<$/>
-variable, which is implicitly lexically scoped.  All access to the
-current (or most recent) match is through this variable, even when
+variable, which is implicitly lexically scoped.  All user access to the
+most recent match is through this variable, even when
 it doesn't look like it.  The individual capture variables (such as C<$0>,
 C<$1>, etc.) are just elements of C<$/>.
 
 By the way, unlike in Perl 5, the numbered capture variables now
 start at C<$0> instead of C<$1>.  See below.
 
+During the execution of a match, the current match state is stored in a
+C<$_> variable lexically scoped to an appropriate portion of the match.
+This is transparent to the user for simple matches.
+
 =head1 Unchanged syntactic features
 
 The following regex features use the same syntax as in Perl 5:
@@ -75,9 +79,11 @@
 While the syntax of C<|> does not change, the default semantics do
 change slightly.  We are attempting to concoct a pleasing mixture
 of declarative and procedural matching so that we can have the
-best of both.  See the section below on "Longest-token matching".
+best of both.  In short, you need not write your own tokener for
+a grammar because Perl will write one for you.  See the section
+below on "Longest-token matching".
 
-=head1 Simplified lexical parsing
+=head1 Simplified lexical parsing of patterns
 
 Unlike traditional regular expressions, Perl 6 does not require
 you to memorize an arbitrary list of metacharacters.  Instead it
@@ -202,58 +208,49 @@
 =item *
 
 The C<:c> (or C<:continue>) modifier causes the pattern to continue
-scanning from the string's current C<.pos>:
+scanning from the specified position (defaulting to C<$/.to>):
 
- m:c/ pattern /# start at end of
-   # previous match on $_
+ m:c($p)/ pattern / # start scanning at position $p
 
 Note that this does not automatically anchor the pattern to the starting
 location.  (Use C<:p> for that.)  The pattern you supply to C
 has an implicit C<:c> modifier.
 
-The C<:continue> modifier takes an optional argument of type C
-which specifies the point at which to start scanning for a match.
-This should not be used unless you know what you're doing, or just
-happen to like hard-to-debug infinite loops.
+String positions are of type C and should generally be treated
+as opaque.
 
 =item *
 
 The C<:p> (or C<:pos>) modifier causes the pattern to try to match only at
-the string's current C<.pos>:
+the specified string position:
 
- m:p/ pattern /# match at end of
-   # previous match on $_
+ m:pos($p)/ pattern /  # match at position $p
 
-Since this is implicitly anchored to the position, it's suitable for
-building parsers and lexers.  The pattern you supply to a Perl macro's
-C trait has an implicit C<:p> modifier.
+If the argument is omitted, it defaults to C<$/.to>.  (Unlike in
+Perl 5, the string itself has no clue where its last match ended.)
+All subrule matches are implicitly passed their starting position.
+Likewise, the pattern you supply to a Perl macro's C
+trait has an implicit C<:p> modifier.
 
 Note that
 
- m:c/pattern/
+ m:c($p)/pattern/
 
 is roughly equivalent to
 
- m:p/.*? <( pattern )> /
-
-Also note that any regex called as a subrule is implicitly anchored to the
-current position anyway.
-
-The C<:pos> modifier takes an optional argument of type C
-which specifies the point at which to attempt a match.  This should not
-be used lightly.  Put it in the category of a "goto".
+ m:p($p)/.*? <( pattern )> /
 
 =item *
 
 The new C<:s> (C<:sigspace>) modifier causes whitespace sequences
 to be considered "significant"; they are replaced by a whitespace
-matching rule, C<< <+ws> >>.  That is,
+matching rule, C<< <.ws> >>.  That is,
 
  m:s/ next cmd =   /
 
 is the same as:
 
- m/ <+ws> next <+ws> cmd <+ws> = <+ws> /
+ m/ <.ws> next <.ws> cmd <.ws> = <.ws> /
 
 which is effectively the same as:
 
@@ -265,9 +262,9 @@
 
 or equivalently,
 
- m { (a|\*) <+ws> (b|\+) }
+ m { (a|\*) <.ws> (b|\+) }
 
-C<< <+ws> 

[svn:perl6-synopsis] r14455 - doc/trunk/design/syn

2007-09-11 Thread larry
Author: larry
Date: Tue Sep 11 12:06:38 2007
New Revision: 14455

Modified:
   doc/trunk/design/syn/S05.pod

Log:
wb is zero-width so takes ? or !


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podTue Sep 11 12:06:38 2007
@@ -1337,9 +1337,9 @@
 
 A C<«> or C<<< << >>> token indicates a left word boundary.  A C<»> or
 C<<< >> >>> token indicates a right word boundary.  (As separate tokens,
-these need not be balanced.)  Perl 5's C<\b> is replaced by a C<< <.wb> >>
+these need not be balanced.)  Perl 5's C<\b> is replaced by a C<<  >>
 "word boundary" assertion, while C<\B> becomes C<<  >>.  (None of
-these are dependent on the definition of C<<  >>, but only on the C<\w>
+these are dependent on the definition of C<< <.ws> >>, but only on the C<\w>
 definition of "word" characters.)
 
 =back


[svn:perl6-synopsis] r14456 - doc/trunk/design/syn

2007-09-11 Thread larry
Author: larry
Date: Tue Sep 11 13:20:07 2007
New Revision: 14456

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Clarifications requested by pmichaud++


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podTue Sep 11 13:20:07 2007
@@ -662,8 +662,18 @@
 \s+  { print "but does contain whitespace\n" }
  /
 
+An B reduce from a regex closure binds the I
+for this match:
+
+/ (\d) { reduce $0.sqrt } Remainder /;
+
+This has the effect of capturing the square root of the numified string,
+instead of the string.  The C part is matched but is not returned
+unless the first reduce is later overridden by another reduce.
+
+These closures are invoked with a topic (C<$_>) of the current match state.
 Within a closure, the instantaneous position within the search is
-denoted by the special variable C<$¢>.  As with all string positions,
+denoted by the C<.pos> method on that object.  As with all string positions,
 you must not treat it as a number unless you are very careful about
 which units you are dealing with.
 
@@ -1122,18 +1132,6 @@
 The closure is guaranteed to be run at the canonical time; it declares
 a sequence point, and is considered to be procedural.
 
-As with an ordinary embedded closure, an B return from a
-regex closure binds the I for this match, ignores the
-rest of the current regex, and reports success:
-
-/ (\d) <{ return $0.sqrt }> NotReached /;
-
-This has the effect of capturing the square root of the numified string,
-instead of the string.  The C part is not reached.
-
-These closures are invoked as anonymous methods on the C object.
-See L below for more about result objects.
-
 =item *
 
 A leading C<&> interpolates the return value of a subroutine call as