[svn:perl6-synopsis] r14457 - doc/trunk/design/syn

2007-09-13 Thread larry
Author: larry
Date: Thu Sep 13 08:59:19 2007
New Revision: 14457

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Suggestioned clarifications from lots of folks++


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podThu Sep 13 08:59:19 2007
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
-   Last Modified: 11 Sep 2007
+   Last Modified: 13 Sep 2007
Number: 5
-   Version: 65
+   Version: 66
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I rather than "regular
@@ -44,9 +44,6 @@
 By the way, unlike in Perl 5, the numbered capture variables now
 start at C<$0> instead of C<$1>.  See below.
 
-During the execution of a match, the current match state is stored in a
-C<$_> variable lexically scoped to an appropriate portion of the match.
-This is transparent to the user for simple matches.
 
 =head1 Unchanged syntactic features
 
@@ -333,11 +330,11 @@
 If followed by an C, it means repetition.  Use C<:x(4)> for the
 general form.  So
 
- s:4x [ (<.ident>) = (\N+) $$] [$0 => $1];
+ s:4x [ (<.ident>) = (\N+) $$] = "$0 => $1";
 
 is the same as:
 
- s:x(4) [ (<.ident>) = (\N+) $$] [$0 => $1];
+ s:x(4) [ (<.ident>) = (\N+) $$] = "$0 => $1";
 
 which is almost the same as:
 
@@ -407,7 +404,7 @@
 
 The new C<:rw> modifier causes this regex to I the current
 string for modification rather than assuming copy-on-write semantics.
-All the bindings in C<$/> become lvalues into the string, such
+All the captures in C<$/> become lvalues into the string, such
 that if you modify, say, C<$1>, the original string is modified in
 that location, and the positions of all the other fields modified
 accordingly (whatever that means).  In the absence of this modifier
@@ -662,20 +659,32 @@
 \s+  { print "but does contain whitespace\n" }
  /
 
-An B reduce from a regex closure binds the I
+An B reduction using the C function sets the I
 for this match:
 
-/ (\d) { reduce $0.sqrt } Remainder /;
+/ (\d) { make $0.sqrt } Remainder /;
 
 This has the effect of capturing the square root of the numified string,
 instead of the string.  The C part is matched but is not returned
-unless the first reduce is later overridden by another reduce.
+unless the first C is later overridden by another C.
 
-These closures are invoked with a topic (C<$_>) of the current match state.
-Within a closure, the instantaneous position within the search is
-denoted by the C<.pos> method on that object.  As with all string positions,
-you must not treat it as a number unless you are very careful about
-which units you are dealing with.
+These closures are invoked with a topic (C<$_>) of the current match
+state (a C object).  Within a closure, the instantaneous
+position within the search is denoted by the C<.pos> method on
+that object.  As with all string positions, you must not treat it
+as a number unless you are very careful about which units you are
+dealing with.
+
+The C object can also return the original item that we are
+matching against; this is available from the C<._> method, named to
+remind you that it probably came from the user's C<$_> variable.
+(But that may well be off in some other scope when indirect rules
+are called, so we mustn't rely on the user's lexical scope.)
+
+The closure is also guaranteed to start with a C<$/> C object
+representing the match so far.  However, if the closure does its own
+internal matching, its C<$/> variable will be rebound to the result
+of I match until the end of the embedded closure.
 
 =item *
 
@@ -747,6 +756,11 @@
 foo,
 foo,bar,
 
+It is legal for the separator to be zero-width as long as the pattern on
+the left progresses on each iteration:
+
+. **# match sequence of identical characters
+
 =item *
 
 C<< <...> >> are now extensible metasyntax delimiters or I
@@ -784,7 +798,7 @@
 C<< <$var> >>.  (See assertions below.)  This form does not capture,
 and it fails if C<$var> is tainted.
 
-However, a variable used as the left side of a binding or submatch
+However, a variable used as the left side of an alias or submatch
 operator is not used for matching.
 
 $x = 
@@ -795,13 +809,41 @@
 
 "$0" ~~ 
 
-It is non-sensical to bind to something that is not a variable:
+On the other hand, it is non-sensical to alias to something that is
+not a variable:
 
 "$0" =  # ERROR
+$0 =# okay
+$x =# okay, temporary capture
+$ =  # okay, persistent capture
+  # same thing
+
+Variables declared in capture aliases are lexically scoped to the
+rest of the regex.  You should not confuse this use of C<=> with
+either ordinary assignment or ordinary binding.  You should read
+the C<=> more like the 

[svn:perl6-synopsis] r14458 - doc/trunk/design/syn

2007-09-13 Thread pmichaud
Author: pmichaud
Date: Thu Sep 13 10:04:14 2007
New Revision: 14458

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Fix up some unquoted punctuation in regexes.


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podThu Sep 13 10:04:14 2007
@@ -243,15 +243,15 @@
 to be considered "significant"; they are replaced by a whitespace
 matching rule, C<< <.ws> >>.  That is,
 
- m:s/ next cmd =   /
+ m:s/ next cmd '='   /
 
 is the same as:
 
- m/ <.ws> next <.ws> cmd <.ws> = <.ws> /
+ m/ <.ws> next <.ws> cmd <.ws> '=' <.ws> /
 
 which is effectively the same as:
 
- m/ \s* next \s+ cmd \s* = \s* /
+ m/ \s* next \s+ cmd \s* '=' \s* /
 
 But in the case of
 
@@ -330,15 +330,15 @@
 If followed by an C, it means repetition.  Use C<:x(4)> for the
 general form.  So
 
- s:4x [ (<.ident>) = (\N+) $$] = "$0 => $1";
+ s:4x [ (<.ident>) '=' (\N+) $$] = "$0 => $1";
 
 is the same as:
 
- s:x(4) [ (<.ident>) = (\N+) $$] = "$0 => $1";
+ s:x(4) [ (<.ident>) '=' (\N+) $$] = "$0 => $1";
 
 which is almost the same as:
 
- s:c[ (<.ident>) = (\N+) $$] = "$0 => $1" for 1..4;
+ s:c[ (<.ident>) '=' (\N+) $$] = "$0 => $1" for 1..4;
 
 except that the string is unchanged unless all four matches are found.
 However, ranges are allowed, so you can say C<:x(1..4)> to change anywhere
@@ -462,7 +462,7 @@
 The C<:i>, C<:s>, C<:Perl5>, and Unicode-level modifiers can be
 placed inside the regex (and are lexically scoped):
 
- m/:s alignment = [:i left|right|cent[er|re]] /
+ m/:s alignment '=' [:i left|right|cent[er|re]] /
 
 As with modifiers outside, only parentheses are recognized as valid
 brackets for args to the adverb.  In particular:
@@ -2085,20 +2085,20 @@
 
 can also be written:
 
- $result = mm/ (\S+) => (\S+)/;
+ $result = mm/ (\S+) '=>' (\S+)/;
  ($key, $val) = @$result;
 
 To get a single capture into a string, use a subscript:
 
- $mystring = "{ mm/ (\S+) => (\S+)/[0] }";
+ $mystring = "{ mm/ (\S+) '=>' (\S+)/[0] }";
 
 To get all the captures into a string, use a I slice:
 
- $mystring = "{ mm/ (\S+) => (\S+)/[] }";
+ $mystring = "{ mm/ (\S+) '=>' (\S+)/[] }";
 
 Or cast it into an array:
 
- $mystring = "@( mm/ (\S+) => (\S+)/ )";
+ $mystring = "@( mm/ (\S+) '=>' (\S+)/ )";
 
 Note that, as a scalar variable, C<$/> doesn't automatically flatten
 in list context.  Use C<@()> as a shorthand for C<@($/)> to flatten
@@ -2457,7 +2457,7 @@
 C<|> or C<||> (but not after each C<&> or C<&&>). Hence:
 
   # $0  $1$2   $3$4   $5
- $tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
+ $tune_up = rx/ ("don't") (ray) (me) (for) (solar tea), ("d'oh!")
   # $0  $1  $2$3$4
   | (every) (green) (BEM) (devours) (faces)
   /;
@@ -2800,7 +2800,7 @@
 This I behavior is particularly useful for reinstituting
 Perl5 semantics for consecutive subpattern numbering in alternations:
 
- $tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
+ $tune_up = rx/ ("don't") (ray) (me) (for) (solar tea), ("d'oh!")
   | $6 = (every) (green) (BEM) (devours) (faces)
   #  $7  $8$9$10
   /;
@@ -3267,9 +3267,9 @@
 so too a grammar can collect a set of named rules together:
 
  grammar Identity {
- rule name { Name = (\N+) }
- rule age  { Age  = (\d+) }
- rule addr { Addr = (\N+) }
+ rule name { Name '=' (\N+) }
+ rule age  { Age  '=' (\d+) }
+ rule addr { Addr '=' (\N+) }
  rule desc {
   \n
\n


[svn:perl6-synopsis] r14459 - doc/trunk/design/syn

2007-09-13 Thread larry
Author: larry
Date: Thu Sep 13 10:32:53 2007
New Revision: 14459

Modified:
   doc/trunk/design/syn/S05.pod

Log:
grammaro from Coke++


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podThu Sep 13 10:32:53 2007
@@ -106,7 +106,7 @@
 
 moose*
 
-quantifies only the 'e' and match "mooseee", saying
+quantifies only the 'e' and matches "mooseee", saying
 
 'moose'*
 


[svn:perl6-synopsis] r14460 - doc/trunk/design/syn

2007-09-13 Thread larry
Author: larry
Date: Thu Sep 13 16:29:08 2007
New Revision: 14460

Modified:
   doc/trunk/design/syn/S03.pod
   doc/trunk/design/syn/S05.pod

Log:
Clarifications requested by Wolfgang Laun++, and then some


Modified: doc/trunk/design/syn/S03.pod
==
--- doc/trunk/design/syn/S03.pod(original)
+++ doc/trunk/design/syn/S03.podThu Sep 13 16:29:08 2007
@@ -12,9 +12,9 @@
 
   Maintainer: Larry Wall <[EMAIL PROTECTED]>
   Date: 8 Mar 2004
-  Last Modified: 6 Sep 2007
+  Last Modified: 13 Sep 2007
   Number: 3
-  Version: 121
+  Version: 122
 
 =head1 Overview
 
@@ -355,31 +355,106 @@
 say $x unless %seen{$x}++;
 
 Increment of a C (in a suitable container) works similarly to
-Perl 5, but is generalized slightly.  First, the string is examined
-to see if it could be the string representation of a number in
-any common representation, including floating point and radix
-notation. (Surrounding whitespace is also allowed around such a
-number.)  If it appears to be a number, it is converted to a number
-and incremented as a number.  Otherwise, a scan is made for the
-final alphanumeric sequence in the string.  Unlike in Perl 5, this
+Perl 5, but is generalized slightly.
+A scan is made for the final alphanumeric sequence in
+the string that is not preceded by a '.' character.  Unlike in Perl 5, this
 alphanumeric sequence need not be anchored to the beginning of the
-string, nor does it need to begin with an alphabetic character; the
-final sequence in the string matching C<\w+> is incremented regardless
-of what comes before it.  For its typical use of incrementing a
-filename, you don't have to worry about the path name, but you do
-still have to worry about the extension, so you probably want to say
-
-my $fh = open $filename++ ~ '.jpg';
-
-Alternately, you can increment a submatch:
-
-$filename ~~ s[ <( <<\w+>> )> \.\w+$] = $().succ;
-
-Perl 6 also supports C decrement with similar semantics.
-
-Increment and decrement are defined in terms of the C<.succ> and
-C<.pred> methods on the type of object in the C container.
-More specifically,
+string, nor does it need to begin with an alphabetic character;
+the final sequence in the string matching C<<  + >>
+is incremented regardless of what comes before it.
+
+The C<<  >> character class is defined as that subset of
+C<\w> that Perl knows how to increment within a range, as defined
+below.
+
+The additional matching behaviors provide two useful benefits:
+for its typical use of incrementing a filename, you don't have to
+worry about the path name or the extension:
+
+$file = "/tmp/pix000.jpg";
+$file++;# /tmp/pix001.jpg, not /tmp/pix000.jph
+
+Perhaps more to the point, if you happen to increment a string that ends
+with a decimal number, it's likely to do the right thing:
+
+$num = "123.456";
+$num++; # 124.456, not 123.457
+
+Character positions are incremented within their natural range for
+any Unicode range that is deemed to represent the digits 0..9 or
+that is deemed to be a complete cyclical alphabet for a (one case
+of) a (Unicode) script.  Only scripts that represent their alphabet
+in codepoints that form a cycle independent of other alphabets may
+be so used.  (This specification defers to the users of such a script
+for determining the proper cycle of letters.)  We arbitrarily define
+the ASCII alphabet not to intersect with other scripts that make use
+of characters in that range, but alphabets that intersperse ASCII letters are
+not allowed.
+
+If the current character in a string position is the final character
+in such a range, it wraps to the first character of the range and
+sends a "carry" to the position left of it, and that position is
+then incremented in its own range.  If and only if the leftmost
+position is exhausted in its range, an additional character of the
+same range is inserted to hold the carry in the same fashion as Perl
+5, so incrementing '(zz99)' turns into '(aaa00)' and incrementing
+'(99zz)' turns into '(100aa)'.
+
+The following Unicode ranges are some of the possible rangechar ranges.
+For alphabets we might have ranges like:
+
+A..Z# ASCII uc
+a..z# ASCII lc
+Α..Ω# Greek uc
+α..ω# Greek lc (presumably skipping U+03C2, final sigma)
+א..ת# Hebrew
+  etc.  # (XXX out of my depth here)
+
+For digits we have ranges like:
+
+0..9# ASCII
+٠..٩# Arabic-Indic
+०..९# Devangari
+০..৯# Bengali 
+੦..੯# Gurmukhi
+૦..૯# Gujarati
+୦..୯# Oriya
+  etc.
+
+Other non-script 0..9 ranges may also be incremented, such as
+
+⁰..⁹# superscripts (note, cycle includes latin-1 chars)
+₀..₉# subscripts
+0..9  # fullwith digits
+
+Conjecturally, any common sequence may be treated as a cycle even if it does
+not represent 0..9:

[svn:perl6-synopsis] r14461 - doc/trunk/design/syn

2007-09-13 Thread larry
Author: larry
Date: Thu Sep 13 17:26:46 2007
New Revision: 14461

Modified:
   doc/trunk/design/syn/S03.pod

Log:
typos and such


Modified: doc/trunk/design/syn/S03.pod
==
--- doc/trunk/design/syn/S03.pod(original)
+++ doc/trunk/design/syn/S03.podThu Sep 13 17:26:46 2007
@@ -18,7 +18,7 @@
 
 =head1 Overview
 
-For a summary of the changes from Perl 5, see L.
+For a summary of the changes from Perl 5, see L.
 
 =head1 Operator precedence
 
@@ -234,7 +234,7 @@
 
 4,3, sort 2,1   # 4,3,1,2
 
-As in Perl 5, a list operator looks like a term to the expression on
+As in Perl 5, a list operator looks like a term to the expression on
 its left, so it binds tighter than comma on the left but looser than
 comma on the right--see List operator precedence below.
 
@@ -276,7 +276,7 @@
 $obj.::Class::meth
 $obj.Class::meth# same thing, assuming Class is predeclared
 
-As in Perl 5, tells the dispatcher which class to start searching from,
+As in Perl 5, tells the dispatcher which class to start searching from,
 not the exact method to call.
 
 =item *
@@ -328,7 +328,7 @@
 
 will always result in a compile-time error indicating the user should
 use C<< infix:<~> >> instead.  This is to catch an error likely to
-be made by Perl 5 programmers learning Perl 6.
+be made by Perl 5 programmers learning Perl 6.
 
 =back
 
@@ -341,7 +341,7 @@
 behavior unless some explicit sequencing operator is interposed.
 See L.
 
-As with all postfix operators in Perl 6, no space is allowed between
+As with all postfix operators in Perl 6, no space is allowed between
 a term and its postfix.  See S02 for why, and for how to work around the
 restriction with a "long dot".
 
@@ -355,9 +355,9 @@
 say $x unless %seen{$x}++;
 
 Increment of a C (in a suitable container) works similarly to
-Perl 5, but is generalized slightly.
+Perl 5, but is generalized slightly.
 A scan is made for the final alphanumeric sequence in
-the string that is not preceded by a '.' character.  Unlike in Perl 5, this
+the string that is not preceded by a '.' character.  Unlike in Perl 5, this
 alphanumeric sequence need not be anchored to the beginning of the
 string, nor does it need to begin with an alphabetic character;
 the final sequence in the string matching C<<  + >>
@@ -382,7 +382,7 @@
 
 Character positions are incremented within their natural range for
 any Unicode range that is deemed to represent the digits 0..9 or
-that is deemed to be a complete cyclical alphabet for a (one case
+that is deemed to be a complete cyclical alphabet for (one case
 of) a (Unicode) script.  Only scripts that represent their alphabet
 in codepoints that form a cycle independent of other alphabets may
 be so used.  (This specification defers to the users of such a script
@@ -396,8 +396,8 @@
 sends a "carry" to the position left of it, and that position is
 then incremented in its own range.  If and only if the leftmost
 position is exhausted in its range, an additional character of the
-same range is inserted to hold the carry in the same fashion as Perl
-5, so incrementing '(zz99)' turns into '(aaa00)' and incrementing
+same range is inserted to hold the carry in the same fashion as Perl 5,
+so incrementing '(zz99)' turns into '(aaa00)' and incrementing
 '(99zz)' turns into '(100aa)'.
 
 The following Unicode ranges are some of the possible rangechar ranges.
@@ -435,19 +435,19 @@
 ①..⑳# circled digits 1..20
 ⒜..⒵# parenthesize lc
 ⚀..⚅# die faces 1..6
-❶..❿# dingbat negative circuled 1..10
+❶..❿# dingbat negative circled 1..10
   etc.
 
 While it doesn't really make sense to "carry" such numbers when they
 reach the end of their cycle, treating such values as incrementable may
 be convenient for writing outlines and similar numbered bullet items.
 (Note that we can't just increment unrecognized characters, because
-we have to recognize the final sequence of rangechars before knowing
+we have to locate the string's final sequence of rangechars before knowing
 which portion of the string to increment.  Note also that all character
 increments can be handled by lookup in a single table of successors
 since we've defined our ranges not to include overlapping cycles.)
 
-Perl 6 also supports C decrement with similar semantics, simply by
+Perl 6 also supports C decrement with similar semantics, simply by
 running the cycles the other direction.  However, leftmost characters
 are never removed, and the decrement fails when you reach a string like
 "aaa" or "000".
@@ -543,8 +543,8 @@
 
 +$x
 
-Unlike in Perl 5, where C<+> is a no-op, this operator coerces
-to numeric context in Perl 6.  (It coerces only the value, not the
+Unlike in Perl 5, where C<+> is a no-op, this operator coerces
+to numeric context in Perl 6.  (It coerces only the value, not the
 original variable.)  The narrowest appropriate type of