Author: larry
Date: Mon Jul 7 21:30:08 2008
New Revision: 14557
Modified:
doc/trunk/design/syn/S05.pod
Log:
Clarify the role of whitespace within transliterations
Power up transliterations with regexes and closures
Formally define the implied alternation as equivalent to longest-token matching
Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podMon Jul 7 21:30:08 2008
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 21 Jun 2008
+ Last Modified: 7 Jul 2008
Number: 5
- Version: 82
+ Version: 83
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I rather than "regular
@@ -3661,12 +3661,25 @@
$str.=trans( 'A'=>'a', 'B'=>'b', 'C'=>'c' );
+Whitespace characters are taken literally as characters to be
+translated from or to. The C<..> range sequence is the only metasyntax
+recognized within a string, though you may of course use backslash
+interpolations in double quotes. If the right side is too short, the
+final character is replicated out to the length of the left string.
+If there is no final character because the right side is the null
+string, the result is deletion instead.
+
=item *
-The two sides of each pair may also be Array objects:
+Either or both sides of the pair may also be Array objects:
$str.=trans( ['A'..'C'] => ['a'..'c'], => );
+The array version is the underlying primitive form: the semantics of
+the string form is exactly equivalent to first doing C<..> expansion
+and then splitting the string into individual characters and then
+using that as an array.
+
=item *
The array version can map one-or-more characters to one-or-more
@@ -3675,11 +3688,36 @@
$str.=trans( [' ', '<','>','&'] =>
[' ', '<', '>', '&' ]);
-
In the case that more than one sequence of input characters matches,
the longest one wins. In the case of two identical sequences the
first in order wins.
+=item *
+
+The recognition done by the string and array forms is very basic.
+To achieve greater power, any recognition element of the left side
+may be specified by a regex that can do character classes, lookahead,
+etc.
+
+
+$str.=trans( [/ \h /, '<','>','&'] =>
+ [' ', '<', '>', '&' ]);
+
+$str.=trans( / \s+ /, ' ' ); # squash all whitespace to one space
+
+These submatches are mixed into the overall match in exactly the same way that
+they are mixed into parallel alternation in ordinary regex processing, so
+longest token rules apply across all the possible matches specified to the
+transliteration operator. Once a match is made and transliterated, the
parallel
+matching resumes at the new position following the end of the previous match,
+even if it matched multiple characters.
+
+=item *
+
+If the right side of the arrow is a closure, it is evaluated to
+determine the replacement value. If the left side was matched by a
+regex, the resulting match object is available within the closure.
+
=back
=head1 Substitution