[svn:perl6-synopsis] r14457 - doc/trunk/design/syn
Author: larry Date: Thu Sep 13 08:59:19 2007 New Revision: 14457 Modified: doc/trunk/design/syn/S05.pod Log: Suggestioned clarifications from lots of folks++ Modified: doc/trunk/design/syn/S05.pod == --- doc/trunk/design/syn/S05.pod(original) +++ doc/trunk/design/syn/S05.podThu Sep 13 08:59:19 2007 @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and Larry Wall <[EMAIL PROTECTED]> Date: 24 Jun 2002 - Last Modified: 11 Sep 2007 + Last Modified: 13 Sep 2007 Number: 5 - Version: 65 + Version: 66 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I rather than "regular @@ -44,9 +44,6 @@ By the way, unlike in Perl 5, the numbered capture variables now start at C<$0> instead of C<$1>. See below. -During the execution of a match, the current match state is stored in a -C<$_> variable lexically scoped to an appropriate portion of the match. -This is transparent to the user for simple matches. =head1 Unchanged syntactic features @@ -333,11 +330,11 @@ If followed by an C, it means repetition. Use C<:x(4)> for the general form. So - s:4x [ (<.ident>) = (\N+) $$] [$0 => $1]; + s:4x [ (<.ident>) = (\N+) $$] = "$0 => $1"; is the same as: - s:x(4) [ (<.ident>) = (\N+) $$] [$0 => $1]; + s:x(4) [ (<.ident>) = (\N+) $$] = "$0 => $1"; which is almost the same as: @@ -407,7 +404,7 @@ The new C<:rw> modifier causes this regex to I the current string for modification rather than assuming copy-on-write semantics. -All the bindings in C<$/> become lvalues into the string, such +All the captures in C<$/> become lvalues into the string, such that if you modify, say, C<$1>, the original string is modified in that location, and the positions of all the other fields modified accordingly (whatever that means). In the absence of this modifier @@ -662,20 +659,32 @@ \s+ { print "but does contain whitespace\n" } / -An B reduce from a regex closure binds the I +An B reduction using the C function sets the I for this match: -/ (\d) { reduce $0.sqrt } Remainder /; +/ (\d) { make $0.sqrt } Remainder /; This has the effect of capturing the square root of the numified string, instead of the string. The C part is matched but is not returned -unless the first reduce is later overridden by another reduce. +unless the first C is later overridden by another C. -These closures are invoked with a topic (C<$_>) of the current match state. -Within a closure, the instantaneous position within the search is -denoted by the C<.pos> method on that object. As with all string positions, -you must not treat it as a number unless you are very careful about -which units you are dealing with. +These closures are invoked with a topic (C<$_>) of the current match +state (a C object). Within a closure, the instantaneous +position within the search is denoted by the C<.pos> method on +that object. As with all string positions, you must not treat it +as a number unless you are very careful about which units you are +dealing with. + +The C object can also return the original item that we are +matching against; this is available from the C<._> method, named to +remind you that it probably came from the user's C<$_> variable. +(But that may well be off in some other scope when indirect rules +are called, so we mustn't rely on the user's lexical scope.) + +The closure is also guaranteed to start with a C<$/> C object +representing the match so far. However, if the closure does its own +internal matching, its C<$/> variable will be rebound to the result +of I match until the end of the embedded closure. =item * @@ -747,6 +756,11 @@ foo, foo,bar, +It is legal for the separator to be zero-width as long as the pattern on +the left progresses on each iteration: + +. **# match sequence of identical characters + =item * C<< <...> >> are now extensible metasyntax delimiters or I @@ -784,7 +798,7 @@ C<< <$var> >>. (See assertions below.) This form does not capture, and it fails if C<$var> is tainted. -However, a variable used as the left side of a binding or submatch +However, a variable used as the left side of an alias or submatch operator is not used for matching. $x = @@ -795,13 +809,41 @@ "$0" ~~ -It is non-sensical to bind to something that is not a variable: +On the other hand, it is non-sensical to alias to something that is +not a variable: "$0" = # ERROR +$0 =# okay +$x =# okay, temporary capture +$ = # okay, persistent capture + # same thing + +Variables declared in capture aliases are lexically scoped to the +rest of the regex. You should not confuse this use of C<=> with +either ordinary assignment or ordinary binding. You should read +the C<=> more like the
[svn:perl6-synopsis] r14458 - doc/trunk/design/syn
Author: pmichaud Date: Thu Sep 13 10:04:14 2007 New Revision: 14458 Modified: doc/trunk/design/syn/S05.pod Log: Fix up some unquoted punctuation in regexes. Modified: doc/trunk/design/syn/S05.pod == --- doc/trunk/design/syn/S05.pod(original) +++ doc/trunk/design/syn/S05.podThu Sep 13 10:04:14 2007 @@ -243,15 +243,15 @@ to be considered "significant"; they are replaced by a whitespace matching rule, C<< <.ws> >>. That is, - m:s/ next cmd = / + m:s/ next cmd '=' / is the same as: - m/ <.ws> next <.ws> cmd <.ws> = <.ws> / + m/ <.ws> next <.ws> cmd <.ws> '=' <.ws> / which is effectively the same as: - m/ \s* next \s+ cmd \s* = \s* / + m/ \s* next \s+ cmd \s* '=' \s* / But in the case of @@ -330,15 +330,15 @@ If followed by an C, it means repetition. Use C<:x(4)> for the general form. So - s:4x [ (<.ident>) = (\N+) $$] = "$0 => $1"; + s:4x [ (<.ident>) '=' (\N+) $$] = "$0 => $1"; is the same as: - s:x(4) [ (<.ident>) = (\N+) $$] = "$0 => $1"; + s:x(4) [ (<.ident>) '=' (\N+) $$] = "$0 => $1"; which is almost the same as: - s:c[ (<.ident>) = (\N+) $$] = "$0 => $1" for 1..4; + s:c[ (<.ident>) '=' (\N+) $$] = "$0 => $1" for 1..4; except that the string is unchanged unless all four matches are found. However, ranges are allowed, so you can say C<:x(1..4)> to change anywhere @@ -462,7 +462,7 @@ The C<:i>, C<:s>, C<:Perl5>, and Unicode-level modifiers can be placed inside the regex (and are lexically scoped): - m/:s alignment = [:i left|right|cent[er|re]] / + m/:s alignment '=' [:i left|right|cent[er|re]] / As with modifiers outside, only parentheses are recognized as valid brackets for args to the adverb. In particular: @@ -2085,20 +2085,20 @@ can also be written: - $result = mm/ (\S+) => (\S+)/; + $result = mm/ (\S+) '=>' (\S+)/; ($key, $val) = @$result; To get a single capture into a string, use a subscript: - $mystring = "{ mm/ (\S+) => (\S+)/[0] }"; + $mystring = "{ mm/ (\S+) '=>' (\S+)/[0] }"; To get all the captures into a string, use a I slice: - $mystring = "{ mm/ (\S+) => (\S+)/[] }"; + $mystring = "{ mm/ (\S+) '=>' (\S+)/[] }"; Or cast it into an array: - $mystring = "@( mm/ (\S+) => (\S+)/ )"; + $mystring = "@( mm/ (\S+) '=>' (\S+)/ )"; Note that, as a scalar variable, C<$/> doesn't automatically flatten in list context. Use C<@()> as a shorthand for C<@($/)> to flatten @@ -2457,7 +2457,7 @@ C<|> or C<||> (but not after each C<&> or C<&&>). Hence: # $0 $1$2 $3$4 $5 - $tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!) + $tune_up = rx/ ("don't") (ray) (me) (for) (solar tea), ("d'oh!") # $0 $1 $2$3$4 | (every) (green) (BEM) (devours) (faces) /; @@ -2800,7 +2800,7 @@ This I behavior is particularly useful for reinstituting Perl5 semantics for consecutive subpattern numbering in alternations: - $tune_up = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!) + $tune_up = rx/ ("don't") (ray) (me) (for) (solar tea), ("d'oh!") | $6 = (every) (green) (BEM) (devours) (faces) # $7 $8$9$10 /; @@ -3267,9 +3267,9 @@ so too a grammar can collect a set of named rules together: grammar Identity { - rule name { Name = (\N+) } - rule age { Age = (\d+) } - rule addr { Addr = (\N+) } + rule name { Name '=' (\N+) } + rule age { Age '=' (\d+) } + rule addr { Addr '=' (\N+) } rule desc { \n \n
[svn:perl6-synopsis] r14459 - doc/trunk/design/syn
Author: larry Date: Thu Sep 13 10:32:53 2007 New Revision: 14459 Modified: doc/trunk/design/syn/S05.pod Log: grammaro from Coke++ Modified: doc/trunk/design/syn/S05.pod == --- doc/trunk/design/syn/S05.pod(original) +++ doc/trunk/design/syn/S05.podThu Sep 13 10:32:53 2007 @@ -106,7 +106,7 @@ moose* -quantifies only the 'e' and match "mooseee", saying +quantifies only the 'e' and matches "mooseee", saying 'moose'*
[svn:perl6-synopsis] r14460 - doc/trunk/design/syn
Author: larry Date: Thu Sep 13 16:29:08 2007 New Revision: 14460 Modified: doc/trunk/design/syn/S03.pod doc/trunk/design/syn/S05.pod Log: Clarifications requested by Wolfgang Laun++, and then some Modified: doc/trunk/design/syn/S03.pod == --- doc/trunk/design/syn/S03.pod(original) +++ doc/trunk/design/syn/S03.podThu Sep 13 16:29:08 2007 @@ -12,9 +12,9 @@ Maintainer: Larry Wall <[EMAIL PROTECTED]> Date: 8 Mar 2004 - Last Modified: 6 Sep 2007 + Last Modified: 13 Sep 2007 Number: 3 - Version: 121 + Version: 122 =head1 Overview @@ -355,31 +355,106 @@ say $x unless %seen{$x}++; Increment of a C (in a suitable container) works similarly to -Perl 5, but is generalized slightly. First, the string is examined -to see if it could be the string representation of a number in -any common representation, including floating point and radix -notation. (Surrounding whitespace is also allowed around such a -number.) If it appears to be a number, it is converted to a number -and incremented as a number. Otherwise, a scan is made for the -final alphanumeric sequence in the string. Unlike in Perl 5, this +Perl 5, but is generalized slightly. +A scan is made for the final alphanumeric sequence in +the string that is not preceded by a '.' character. Unlike in Perl 5, this alphanumeric sequence need not be anchored to the beginning of the -string, nor does it need to begin with an alphabetic character; the -final sequence in the string matching C<\w+> is incremented regardless -of what comes before it. For its typical use of incrementing a -filename, you don't have to worry about the path name, but you do -still have to worry about the extension, so you probably want to say - -my $fh = open $filename++ ~ '.jpg'; - -Alternately, you can increment a submatch: - -$filename ~~ s[ <( <<\w+>> )> \.\w+$] = $().succ; - -Perl 6 also supports C decrement with similar semantics. - -Increment and decrement are defined in terms of the C<.succ> and -C<.pred> methods on the type of object in the C container. -More specifically, +string, nor does it need to begin with an alphabetic character; +the final sequence in the string matching C<< + >> +is incremented regardless of what comes before it. + +The C<< >> character class is defined as that subset of +C<\w> that Perl knows how to increment within a range, as defined +below. + +The additional matching behaviors provide two useful benefits: +for its typical use of incrementing a filename, you don't have to +worry about the path name or the extension: + +$file = "/tmp/pix000.jpg"; +$file++;# /tmp/pix001.jpg, not /tmp/pix000.jph + +Perhaps more to the point, if you happen to increment a string that ends +with a decimal number, it's likely to do the right thing: + +$num = "123.456"; +$num++; # 124.456, not 123.457 + +Character positions are incremented within their natural range for +any Unicode range that is deemed to represent the digits 0..9 or +that is deemed to be a complete cyclical alphabet for a (one case +of) a (Unicode) script. Only scripts that represent their alphabet +in codepoints that form a cycle independent of other alphabets may +be so used. (This specification defers to the users of such a script +for determining the proper cycle of letters.) We arbitrarily define +the ASCII alphabet not to intersect with other scripts that make use +of characters in that range, but alphabets that intersperse ASCII letters are +not allowed. + +If the current character in a string position is the final character +in such a range, it wraps to the first character of the range and +sends a "carry" to the position left of it, and that position is +then incremented in its own range. If and only if the leftmost +position is exhausted in its range, an additional character of the +same range is inserted to hold the carry in the same fashion as Perl +5, so incrementing '(zz99)' turns into '(aaa00)' and incrementing +'(99zz)' turns into '(100aa)'. + +The following Unicode ranges are some of the possible rangechar ranges. +For alphabets we might have ranges like: + +A..Z# ASCII uc +a..z# ASCII lc +Α..Ω# Greek uc +α..ω# Greek lc (presumably skipping U+03C2, final sigma) +א..ת# Hebrew + etc. # (XXX out of my depth here) + +For digits we have ranges like: + +0..9# ASCII +٠..٩# Arabic-Indic +०..९# Devangari +০..৯# Bengali +੦..੯# Gurmukhi +૦..૯# Gujarati +୦..୯# Oriya + etc. + +Other non-script 0..9 ranges may also be incremented, such as + +⁰..⁹# superscripts (note, cycle includes latin-1 chars) +₀..₉# subscripts +0..9 # fullwith digits + +Conjecturally, any common sequence may be treated as a cycle even if it does +not represent 0..9:
[svn:perl6-synopsis] r14461 - doc/trunk/design/syn
Author: larry Date: Thu Sep 13 17:26:46 2007 New Revision: 14461 Modified: doc/trunk/design/syn/S03.pod Log: typos and such Modified: doc/trunk/design/syn/S03.pod == --- doc/trunk/design/syn/S03.pod(original) +++ doc/trunk/design/syn/S03.podThu Sep 13 17:26:46 2007 @@ -18,7 +18,7 @@ =head1 Overview -For a summary of the changes from Perl 5, see L. +For a summary of the changes from Perl 5, see L. =head1 Operator precedence @@ -234,7 +234,7 @@ 4,3, sort 2,1 # 4,3,1,2 -As in Perl 5, a list operator looks like a term to the expression on +As in Perl 5, a list operator looks like a term to the expression on its left, so it binds tighter than comma on the left but looser than comma on the right--see List operator precedence below. @@ -276,7 +276,7 @@ $obj.::Class::meth $obj.Class::meth# same thing, assuming Class is predeclared -As in Perl 5, tells the dispatcher which class to start searching from, +As in Perl 5, tells the dispatcher which class to start searching from, not the exact method to call. =item * @@ -328,7 +328,7 @@ will always result in a compile-time error indicating the user should use C<< infix:<~> >> instead. This is to catch an error likely to -be made by Perl 5 programmers learning Perl 6. +be made by Perl 5 programmers learning Perl 6. =back @@ -341,7 +341,7 @@ behavior unless some explicit sequencing operator is interposed. See L. -As with all postfix operators in Perl 6, no space is allowed between +As with all postfix operators in Perl 6, no space is allowed between a term and its postfix. See S02 for why, and for how to work around the restriction with a "long dot". @@ -355,9 +355,9 @@ say $x unless %seen{$x}++; Increment of a C (in a suitable container) works similarly to -Perl 5, but is generalized slightly. +Perl 5, but is generalized slightly. A scan is made for the final alphanumeric sequence in -the string that is not preceded by a '.' character. Unlike in Perl 5, this +the string that is not preceded by a '.' character. Unlike in Perl 5, this alphanumeric sequence need not be anchored to the beginning of the string, nor does it need to begin with an alphabetic character; the final sequence in the string matching C<< + >> @@ -382,7 +382,7 @@ Character positions are incremented within their natural range for any Unicode range that is deemed to represent the digits 0..9 or -that is deemed to be a complete cyclical alphabet for a (one case +that is deemed to be a complete cyclical alphabet for (one case of) a (Unicode) script. Only scripts that represent their alphabet in codepoints that form a cycle independent of other alphabets may be so used. (This specification defers to the users of such a script @@ -396,8 +396,8 @@ sends a "carry" to the position left of it, and that position is then incremented in its own range. If and only if the leftmost position is exhausted in its range, an additional character of the -same range is inserted to hold the carry in the same fashion as Perl -5, so incrementing '(zz99)' turns into '(aaa00)' and incrementing +same range is inserted to hold the carry in the same fashion as Perl 5, +so incrementing '(zz99)' turns into '(aaa00)' and incrementing '(99zz)' turns into '(100aa)'. The following Unicode ranges are some of the possible rangechar ranges. @@ -435,19 +435,19 @@ ①..⑳# circled digits 1..20 ⒜..⒵# parenthesize lc ⚀..⚅# die faces 1..6 -❶..❿# dingbat negative circuled 1..10 +❶..❿# dingbat negative circled 1..10 etc. While it doesn't really make sense to "carry" such numbers when they reach the end of their cycle, treating such values as incrementable may be convenient for writing outlines and similar numbered bullet items. (Note that we can't just increment unrecognized characters, because -we have to recognize the final sequence of rangechars before knowing +we have to locate the string's final sequence of rangechars before knowing which portion of the string to increment. Note also that all character increments can be handled by lookup in a single table of successors since we've defined our ranges not to include overlapping cycles.) -Perl 6 also supports C decrement with similar semantics, simply by +Perl 6 also supports C decrement with similar semantics, simply by running the cycles the other direction. However, leftmost characters are never removed, and the decrement fails when you reach a string like "aaa" or "000". @@ -543,8 +543,8 @@ +$x -Unlike in Perl 5, where C<+> is a no-op, this operator coerces -to numeric context in Perl 6. (It coerces only the value, not the +Unlike in Perl 5, where C<+> is a no-op, this operator coerces +to numeric context in Perl 6. (It coerces only the value, not the original variable.) The narrowest appropriate type of