Author: larry
Date: Wed Mar 19 18:50:05 2008
New Revision: 14526

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Added :samespace and :ss as :sigspace variant


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Wed Mar 19 18:50:05 2008
@@ -16,7 +16,7 @@
    Date: 24 Jun 2002
    Last Modified: 19 Mar 2008
    Number: 5
-   Version: 75
+   Version: 76
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -190,7 +190,7 @@
 ignored in its lexical scope, but not in its dynamic scope.  That is,
 subrules always use their own case settings.
 
-The C<:ii> variant may be used on a substitution to change the
+The C<:ii> (or C<:samecase>) variant may be used on a substitution to change 
the
 substituted string to the same case pattern as the matched string.
 
 If the pattern is matched without the C<:sigspace> modifier, case
@@ -237,7 +237,7 @@
 includes all ignored characters, including any that follow the final
 base character.
 
-The C<:bb> variant may be used on a substitution to change the
+The C<:bb> (or C<:samebase>) variant may be used on a substitution to change 
the
 substituted string to the same accent pattern as the matched string.
 Accent info is carried across on a character by character basis.  If
 the right string is longer than the left one, the remaining characters
@@ -326,11 +326,41 @@
 between sigspace and whitespace is primarily metaphorical, which is
 why the correspondence is both useful and (potentially) confusing.
 
+The C<:ss> (or C<:samespace>) variant may be used on substitutions to
+do smart space mapping.  For each sigspace-induced call to C<< <ws> >>
+on the left, the matched whitespace is copied over to the corresponding
+slot on the right, as represented by a single whitespace character
+in the replacement string wherever space replacement is desired.
+If there are more whitespace slots on the right than the left, those
+righthand characters remain themselves.  If there are not enough
+whitespace slots on the right to map all the available whitespace
+slots from the match, the algorithm tries to minimize information
+loss by randomly splicing "common" whitespace characters out of the
+list of whitespace.  From least valuable to most, the pecking order is:
+
+    spaces
+    tabs
+    all other horizontal whitespace, including Unicode
+    newlines (including crlf as a unit)
+    all other vertical whitespace, including Unicode
+
+The primary intent of these rules is to minimize format disruption
+when substitution happens across line boundaries and such.  There is,
+of course, no guarantee that the result will exactly what a human would
+do.
+
 The C<:s> modifier is considered sufficiently important that
 match variants are defined for them:
 
     mm/match some words/                        # same as m:sigspace
-    ss/match some words/replace those words/    # same as s:sigspace
+    ss/match some words/replace those words/    # same as s:samespace
+
+Note that C<ss///> is defined in terms of C<:ss>, so:
+
+    $_ = "a b\nc\td";
+    ss/b c d/x y z/;
+
+ends up with a value of "C<a x\ny\tz>".
 
 =item *
 

Reply via email to