spaces and transliteration

2008-07-07 Thread Chris Fields
I am working on the transliteration method operator (trans()) for  
Rakudo and wanted to get some input on how character ranges are to be  
used.


Should spaces be ignored in ranges like 'A .. Z'?  Currently the  
implementation I have ignores those spaces but counts any other spaces  
as important, so (using parrot perl6.pbc with my patch):


> say "Whfg nabgure Crey unpxre".trans(' a .. z' => '_n .. za .. m',  
'A .. Z' => 'N .. ZA .. M')

Just_another_Perl_hacker

chris


spaces and transliteration

2008-07-07 Thread Chris Fields
I am working on the transliteration method operator (trans()) for  
Rakudo and wanted to get some input on how character ranges are to be  
used.


Should spaces be ignored in ranges like 'A .. Z'?  Currently the  
implementation I have ignores those spaces but counts any other spaces  
as important, so (using parrot perl6.pbc with my patch):


> say "Whfg nabgure Crey unpxre".trans(' a .. z' => '_n .. za .. m',  
'A .. Z' => 'N .. ZA .. M')

Just_another_Perl_hacker

chris


[svn:perl6-synopsis] r14557 - doc/trunk/design/syn

2008-07-07 Thread larry
Author: larry
Date: Mon Jul  7 21:30:08 2008
New Revision: 14557

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Clarify the role of whitespace within transliterations
Power up transliterations with regexes and closures
Formally define the implied alternation as equivalent to longest-token matching


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podMon Jul  7 21:30:08 2008
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
-   Last Modified: 21 Jun 2008
+   Last Modified: 7 Jul 2008
Number: 5
-   Version: 82
+   Version: 83
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I rather than "regular
@@ -3661,12 +3661,25 @@
 
  $str.=trans( 'A'=>'a', 'B'=>'b', 'C'=>'c' );
 
+Whitespace characters are taken literally as characters to be
+translated from or to.  The C<..> range sequence is the only metasyntax
+recognized within a string, though you may of course use backslash
+interpolations in double quotes.  If the right side is too short, the
+final character is replicated out to the length of the left string.
+If there is no final character because the right side is the null
+string, the result is deletion instead.
+
 =item *
 
-The two sides of each pair may also be Array objects:
+Either or both sides of the pair may also be Array objects:
 
  $str.=trans( ['A'..'C'] => ['a'..'c'],  =>  );
 
+The array version is the underlying primitive form: the semantics of
+the string form is exactly equivalent to first doing C<..> expansion
+and then splitting the string into individual characters and then
+using that as an array.
+
 =item *
 
 The array version can map one-or-more characters to one-or-more
@@ -3675,11 +3688,36 @@
  $str.=trans( [' ',  '<','>','&'] =>
   [' ', '<', '>', '&' ]);
 
-
 In the case that more than one sequence of input characters matches,
 the longest one wins.  In the case of two identical sequences the
 first in order wins.
 
+=item *
+
+The recognition done by the string and array forms is very basic.
+To achieve greater power, any recognition element of the left side
+may be specified by a regex that can do character classes, lookahead,
+etc.
+
+
+$str.=trans( [/ \h /,   '<','>','&'] =>
+ [' ', '<', '>', '&' ]);
+
+$str.=trans( / \s+ /, ' ' );  # squash all whitespace to one space
+
+These submatches are mixed into the overall match in exactly the same way that
+they are mixed into parallel alternation in ordinary regex processing, so
+longest token rules apply across all the possible matches specified to the
+transliteration operator.  Once a match is made and transliterated, the 
parallel
+matching resumes at the new position following the end of the previous match,
+even if it matched multiple characters.
+
+=item *
+
+If the right side of the arrow is a closure, it is evaluated to
+determine the replacement value.  If the left side was matched by a
+regex, the resulting match object is available within the closure.
+
 =back
 
 =head1 Substitution


[svn:perl6-synopsis] r14558 - doc/trunk/design/syn

2008-07-07 Thread larry
Author: larry
Date: Mon Jul  7 21:40:33 2008
New Revision: 14558

Modified:
   doc/trunk/design/syn/S05.pod

Log:
more clarifications and remember to thank cjfields++ this time :)


Modified: doc/trunk/design/syn/S05.pod
==
--- doc/trunk/design/syn/S05.pod(original)
+++ doc/trunk/design/syn/S05.podMon Jul  7 21:40:33 2008
@@ -3692,6 +3692,9 @@
 the longest one wins.  In the case of two identical sequences the
 first in order wins.
 
+As with the string form, missing righthand elements replicate the
+final element, and a null array results in deletion instead.
+
 =item *
 
 The recognition done by the string and array forms is very basic.
@@ -3703,7 +3706,8 @@
 $str.=trans( [/ \h /,   '<','>','&'] =>
  [' ', '<', '>', '&' ]);
 
-$str.=trans( / \s+ /, ' ' );  # squash all whitespace to one space
+$str.=trans( / \s+ / => ' ' );  # squash all whitespace to one space
+$str.=trans( /  / => '' );  # delete all non-alpha
 
 These submatches are mixed into the overall match in exactly the same way that
 they are mixed into parallel alternation in ordinary regex processing, so