Author: larry Date: Tue Aug 1 11:57:10 2006 New Revision: 10536 Modified: doc/trunk/design/syn/S03.pod doc/trunk/design/syn/S05.pod
Log: Fixes suggested by agentzh++. Modified: doc/trunk/design/syn/S03.pod ============================================================================== --- doc/trunk/design/syn/S03.pod (original) +++ doc/trunk/design/syn/S03.pod Tue Aug 1 11:57:10 2006 @@ -12,11 +12,11 @@ Maintainer: Larry Wall <[EMAIL PROTECTED]> Date: 8 Mar 2004 - Last Modified: 19 Jul 2006 + Last Modified: 1 Aug 2006 Number: 3 - Version: 51 + Version: 52 -=head1 Changes to existing operators +=head1 Changes to Perl 5 operators Several operators have been given new names to increase clarity and better Huffman-code the language, while others have changed precedence. (If an @@ -26,6 +26,9 @@ =over +=item * Perl 5's C<${...}>, C<@{...}>, C<%{...}>, etc. dereferencing +forms are now C<$(...)>, C<@(...)>, C<%(...)>, etc. instead. + =item * C<< -> >> becomes C<.>, like the rest of the world uses. =item * The string concatenation C<.> becomes C<~>. Think of it as @@ -1442,7 +1445,7 @@ !== !~~ !eq !=:= !=== !eqv etc. tight and && tight or || ^^ // - ternary ?? !! + conditional ?? !! assignment := ::= => (also = with simple lvalues) += -= **= xx= .= etc. Modified: doc/trunk/design/syn/S05.pod ============================================================================== --- doc/trunk/design/syn/S05.pod (original) +++ doc/trunk/design/syn/S05.pod Tue Aug 1 11:57:10 2006 @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and Larry Wall <[EMAIL PROTECTED]> Date: 24 Jun 2002 - Last Modified: 1 July 2006 + Last Modified: 1 Aug 2006 Number: 5 - Version: 28 + Version: 29 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> because they haven't been @@ -94,7 +94,7 @@ m:g:i/\s* (\w*) \s* ,?/; Every modifier must start with its own colon. The delimiter must be -separated from the final modifier by whitespace if it would be taken +separated from the final modifier by whitespace if it would otherwise be taken as an argument to the preceding modifier (which is true for any bracketing character). @@ -199,7 +199,7 @@ match variants are defined for them: ms/match some words/ # same as m:sigspace - ss/match some words/replace those words/ # same ss s:sigspace + ss/match some words/replace those words/ # same as s:sigspace Conjecture: This might become sufficiently idiomatic that C<ms//> would be better as a "stuttered" C<mm//> instead, much as C<qq//> became idiomatic. @@ -497,7 +497,7 @@ / [foo]**{1,3} / (At least, it fails in the absence of C<use rx :listquantifier>, -which is likely to be unimplemented in Perl 6.0.0 anyway). +which is likely to be unimplemented in Perl 6.0.0 anyway.) The optimizer will likely optimize away things like C<**{1..*}> so that the closure is never actually run in that case. But it's @@ -784,7 +784,7 @@ =item * -A leading C<?{> or C<!{>indicates a code assertion: +A leading C<?{> or C<!{> indicates a code assertion: / (\d**{1..3}) <?{ $0 < 256 }> / / (\d**{1..3}) <!{ $0 < 256 }> / @@ -1011,7 +1011,7 @@ The Perl 6 equivalents are: regex { pattern } # always takes {...} as delimiters - rx / pattern / # can take (almost any) chars as delimiters + rx / pattern / # can take (almost any) chars as delimiters You may not use whitespace or alphanumerics for delimiters. Space is optional unless needed to distinguish from modifier arguments or @@ -1021,14 +1021,14 @@ rx ( pattern ) # okay rx( 1,2,3 ) # tries to call rx function -(This is true of all quotelike constructs in Perl 6.) +(This is true for all quotelike constructs in Perl 6.) =item * If either form needs modifiers, they go before the opening delimiter: $regex = regex :g:s:i { my name is (.*) }; - $regex = rx:g:s:i / my name is (.*) /; # same thing + $regex = rx:g:s:i / my name is (.*) /; # same thing Space is necessary after the final modifier if you use any bracketing character for the delimiter. (Otherwise it would be taken as @@ -1050,7 +1050,7 @@ =item * As the syntax indicates, it is now more closely analogous to a C<sub {...}> -constructor. In fact, that analogy will run I<very> deep in Perl 6. +constructor. In fact, that analogy runs I<very> deep in Perl 6. =item * @@ -1120,10 +1120,10 @@ regex ident { [ <alpha>: | _: ]: \w+: } -but rather easier to read. The bare C<*>, C<+> and C<?> quantifiers +but rather easier to read. The bare C<*>, C<+>, and C<?> quantifiers never backtrack in a C<token> unless some outer regex has specified a C<:panic> option that applies. If you want to prevent even that, use -C<*:>, C<+:> or C<?:> to prevent any backtracking into the quantifier. +C<*:>, C<+:>, or C<?:> to prevent any backtracking into the quantifier. If you want to explicitly backtrack, append either a C<?> or a C<+> to the quantifier. The C<?> forces minimal matching as usual, while the C<+> forces greedy matching. The C<token> declarator is @@ -1248,7 +1248,7 @@ =item * Attempting to backtrack past a C<< <cut> >> causes the complete match -to fail (like backtracking past a C<< <commit> >>. This is because there's +to fail (like backtracking past a C<< <commit> >>). This is because there's now no preceding text to backtrack into. =item * @@ -1546,7 +1546,7 @@ =item * Inside a regex, the C<$/> variable holds the current regex's -incomplete C<Match> object (which can be modified via the internal C<$/>. +incomplete C<Match> object (which can be modified via the internal C<$/>). For example: $str ~~ / foo # Match 'foo' @@ -1651,13 +1651,13 @@ =item * The array elements of the regex's C<Match> object (i.e. C<$/>) -store individual C<Match> objects representing the substrings that where +store individual C<Match> objects representing the substrings that were matched and captured by the first, second, third, etc. I<outermost> (i.e. unnested) subpatterns. So these elements can be treated like fully fledged match results. For example: if m/ (\d\d\d\d)-(\d\d)-(\d\d) (BCE?|AD|CE)?/ { - ($yr, $mon, $day) = $/[0..2] + ($yr, $mon, $day) = $/[0..2]; $era = "$3" if $3; # stringify/boolify @datepos = ( $0.from() .. $2.to() ); # Call Match methods } @@ -1672,8 +1672,8 @@ =item * Substrings matched by I<nested> subpatterns (i.e. nested capturing -parens) are assigned to the array inside the subpattern's parent C<Match> -surrounding subpattern, not to the array of C<$/>. +parens) are assigned to the array inside the nested subpattern's parent C<Match> +object, not to the array of C<$/>. =item * @@ -1721,7 +1721,7 @@ if m/ (\w+) \: (\w+ \s+)* / { say "Key: $0"; # Unquantified --> single Match - say "Values: { @{$1} }"; # Quantified --> array of Match + say "Values: @($1)"; # Quantified --> array of Match } @@ -1746,14 +1746,14 @@ Non-capturing brackets I<don't> create a separate nested lexical scope, so the two subpatterns inside them are actually still in the regex's -top-level scope. Hence their top-level designations: C<$0> and C<$1>. +top-level scope, hence their top-level designations: C<$0> and C<$1>. =item * However, because the two subpatterns are inside a quantified structure, C<$0> and C<$1> will each contain an array. The elements of that array will be the submatches returned by the -corresponding subpattern on each iteration of the non-capturing +corresponding subpatterns on each iteration of the non-capturing parentheses. For example: my $text = "foo:food fool\nbar:bard barb"; @@ -1870,7 +1870,7 @@ =item * -Any bracketed construct that is aliased (see L<Aliasing> below) to a +Any bracketed construct that is aliased (see L</Aliasing> below) to a named variable is also a subrule. =item * @@ -1921,7 +1921,7 @@ =item * Note that it makes no difference whether a subrule is angle-bracketed -(C<< <ident> >>) or aliased (C<< $<ident> := (<alpha>\w*) >>. The name's +(C<< <ident> >>) or aliased (C<< $<ident> := (<alpha>\w*) >>). The name's the thing. @@ -1957,16 +1957,16 @@ $to = $<file>[1]; } -Likewise, with a mixture of both: +And with a mixture of both: if ms/ mv <file>+ <file> / { - $to = pop @{$<file>}; - @from = @{$<file>}; + $to = pop @($<file>); + @from = @($<file>); } =item * -However, if a subrule is explicitly renamed (or aliased -- see L<Aliasing>), +However, if a subrule is explicitly renamed (or aliased -- see L</Aliasing>), then only the I<final> name counts when deciding whether it is or isn't repeated. For example: @@ -2030,7 +2030,7 @@ ms/ $<key>:=( (<[A..E]>) (\d**{3..6}) (X?) ) /; then the outer capturing parens no longer capture into the array of -C<$/> (like unaliased parens would). Instead the aliased parens capture +C<$/> as unaliased parens would. Instead the aliased parens capture into the hash of C<$/>; specifically into the hash element whose key is the alias name. @@ -2068,7 +2068,7 @@ Another way to think about this behavior is that aliased parens create a kind of lexically scoped named subrule; that the contents of the -brackets are treated as if they were part of a separate subrule whose +parentheses are treated as if they were part of a separate subrule whose name is the alias. @@ -2080,14 +2080,14 @@ =item * -If an named scalar alias is applied to a set of I<non-capturing> brackets: +If a named scalar alias is applied to a set of I<non-capturing> brackets: # ___/non-capturing brackets\__ # | | # | | ms/ $<key>:=[ (<[A..E]>) (\d**{3..6}) (X?) ] /; -then the corresponding C<< $/<key> >> object contains only the string +then the corresponding C<< $/<key> >> Match object contains only the string matched by the non-capturing brackets. =item * @@ -2135,7 +2135,7 @@ entry whose key is the name of the alias. And it I<no longer> assigns anything to the hash entry whose key is the subrule name. That is: - if m:/ ID\: $<id>:=<ident> / { + if m/ ID\: $<id>:=<ident> / { say "Identified as $/<id>"; # $/<ident> is undefined } @@ -2146,7 +2146,7 @@ the same subrule in the same scope. For example: if ms/ mv <file>+ $<dir>:=<file> / { - @from = @{$<file>}; + @from = @($<file>); $to = $<dir>; } @@ -2162,7 +2162,7 @@ m/ $1:=(<-[:]>*) \: $0:=<ident> / -the behavior is exactly the same as for a named alias (i.e the various +the behavior is exactly the same as for a named alias (i.e. the various cases described above), except that the resulting C<Match> object is assigned to the corresponding element of the appropriate array rather than to an element of the hash. @@ -2288,7 +2288,7 @@ =item * -An alias can also be specified using an array as the alias instead of scalar. +An alias can also be specified using an array as the alias instead of a scalar. For example: m/ mv @<from>:=[(\S+) \s+]* <dir> /; @@ -2310,12 +2310,12 @@ # Aliasing to @<names> means $/<names> is always # an Array object, so... - say @{$/<names>}; + say @($/<names>); =item * For convenience and consistency, C<< @<key> >> can also be used outside a -regex, as a shorthand for C<< @{ $/<key> } >>. That is: +regex, as a shorthand for C<< @( $/<key> ) >>. That is: ms/ Mr?s? @<names>:=<ident> W\. @<names>:=<ident> | Mr?s? @<names>:=<ident> @@ -2337,7 +2337,7 @@ m/ mv @<files>:=[ f.. \s* ]* /; # $/<files> assigned an array, # each element of which is a - # C<Match> object containing + # Match object containing # the substring matched by Nth # repetition of the non- # capturing bracket match @@ -2356,7 +2356,7 @@ # of Match objects, each of which has its own array # of two subcaptures... - for @{$<pairs>} -> $pair { + for @($<pairs>) -> $pair { say "Key: $pair[0]"; say "Val: $pair[1]"; } @@ -2368,7 +2368,7 @@ # of Match objects, each of which is flattened out of # the two subcaptures within the subpattern - for @{$<pairs>} -> $key, $val { + for @($<pairs>) -> $key, $val { say "Key: $key"; say "Val: $val"; } @@ -2388,7 +2388,7 @@ # Match objects, each of which is the result of the # <pair> subrule call... - for @{$<pairs>} -> $pair { + for @($<pairs>) -> $pair { say "Key: $pair[0]"; say "Val: $pair[1]"; } @@ -2401,7 +2401,7 @@ # nested arrays inside the Match objects returned # by each match of the <pair> subrule... - for @{$<pairs>} -> $key, $val { + for @($<pairs>) -> $key, $val { say "Key: $key"; say "Val: $val"; } @@ -2433,7 +2433,7 @@ # \___ Array alias, so $0 gets a flattened array of # just the (\w+) captures from each repetition - @from = @{$0}; # Flattened list + @from = @($0); # Flattened list $to_str = $1[0][0]; # Nested elems of $to_gap = $1[0][1]; # unflattened list @@ -2442,7 +2442,7 @@ =item * Note again that, outside a regex, C<@0> is simply a shorthand for -C<@{$0}>, so the first assignment above could also have been written: +C<@($0)>, so the first assignment above could also have been written: @from = @0; @@ -2470,7 +2470,7 @@ If a hash alias is applied to a subrule or subpattern then the first nested numeric capture becomes the key of each hash entry and any remaining numeric -captures become the values (in an array if there is more than one), +captures become the values (in an array if there is more than one). =item * @@ -2483,22 +2483,22 @@ if ms/ %0:=<one_to_many>+ / { # $/[0] contains a hash, in which each key is provided by # the first subcapture within C<one_to_many>, and each - # value is an array containing the - # subrule's second, third, and fourth, etc. subcaptures... + # value is an array containing the + # subrule's second, third, fourth, etc. subcaptures... - for %{$/[0]} -> $pair { - say "One: $pair.key"; - say "Many: { @{$pair.value} }"; + for %($/[0]) -> $pair { + say "One: $pair.key()"; + say "Many: { @($pair.value) }"; } } =item * -Outside the regex, C<%0> is a shortcut for C<%{$0}>: +Outside the regex, C<%0> is a shortcut for C<%($0)>: for %0 -> $pair { - say "One: $pair.key"; - say "Many: { @{$pair.value} }"; + say "One: $pair.key()"; + say "Many: @($pair.value)"; } @@ -2521,9 +2521,9 @@ =item * In this case, the behavior of each alias is exactly as described in the -previous sections, except that the resulting capture(s) are bound -directly (but still hypothetically) to the variables of the specified -name that exist in the scope in which the regex is declared. +previous sections, except that any resulting capture is bound +directly (but still hypothetically) to the variable of the specified +name that must already exist in the scope in which the regex is declared. =back @@ -2776,7 +2776,7 @@ =item * -The two sides of the any pair can be strings interpreted as C<tr///> would: +The two sides of any pair can be strings interpreted as C<tr///> would: $str.=trans( 'A..C' => 'a..c', 'XYZ' => 'xyz' ); @@ -2806,10 +2806,10 @@ There are also method forms of C<m//> and C<s///>: $str.match(//); - $str.subst(//, "replacement") - $str.subst(//, {"replacement"}) - $str.=subst(//, "replacement") - $str.=subst(//, {"replacement"}) + $str.subst(//, "replacement"); + $str.subst(//, {"replacement"}); + $str.=subst(//, "replacement"); + $str.=subst(//, {"replacement"}); =back @@ -2830,14 +2830,14 @@ graphemes. If used with an integer, the C<at> assertion will assume you mean the current lexically scoped Unicode level, on the assumption that this integer was somehow generated in this same lexical scope. -If this is outside the current string's allowed abstraction levels, an +If this is outside the current string's allowed Unicode abstraction levels, an exception is thrown. See S02 for more discussion of string positions. =item * C<Buf> types are based on fixed-width cells and can therefore handle integer positions just fine, and treat them as array indices. -In particular, C<buf8> AKA C<buf> is just an old-school byte string. +In particular, C<buf8> (also known as C<buf>) is just an old-school byte string. Matches against C<Buf> types are restricted to ASCII semantics in the absence of an I<explicit> modifier asking for the array's values to be treated as some particular encoding such as UTF-32. (This is @@ -2874,7 +2874,7 @@ The special C<< <,> >> subrule matches the boundary between elements. The C<< <elem> >> assertion matches any individual array element. -It is the equivalent of "dot" for the whole element. +It is the equivalent of the "dot" metacharacter for the whole element. If the array elements are strings, they are concatenated virtually into a single logical string. If the array elements are tokens or other @@ -2895,7 +2895,7 @@ Please be aware that the warnings on C<.from> and C<.to> returning opaque objects goes double for matching against an array, where a particular position reflects both a position within the array and -(potentially) a positional within a string of that array. Do not +(potentially) a position within a string of that array. Do not expect to do math with such values. Nor should you expect to be able to extract a substr that crosses element boundaries. @@ -2903,6 +2903,6 @@ To match against each element of an array, use a hyper operator: - @array».match($regex) + @array».match($regex); =back