Author: lwall
Date: 2009-03-23 02:27:32 +0100 (Mon, 23 Mar 2009)
New Revision: 25970

Modified:
   docs/Perl6/Spec/S05-regex.pod
Log:
clarifications to .caps and .chunks requested by moritz++


Modified: docs/Perl6/Spec/S05-regex.pod
===================================================================
--- docs/Perl6/Spec/S05-regex.pod       2009-03-23 00:41:38 UTC (rev 25969)
+++ docs/Perl6/Spec/S05-regex.pod       2009-03-23 01:27:32 UTC (rev 25970)
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <pmich...@pobox.com> and
                Larry Wall <la...@wall.org>
    Date: 24 Jun 2002
-   Last Modified: 18 Mar 2009
+   Last Modified: 22 Mar 2009
    Number: 5
-   Version: 93
+   Version: 94
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -2562,15 +2562,35 @@
 =item *
 
 As described above, a C<Match> in list context returns its positional
-captures.  However, sometimes you'd rather get a flat list of tokens in
-the order they occur in the text.  The C<.caps> method returns a list
-of every captured item, regardless of how it was otherwise bound into
-named or numbered captures.  The C<.chunks> method returns the captures
-as well as all the interleaved "noise" between the captures. [Conjecture:
-we could also have C<.deepcaps> and C<.deepchunks> that recursively expand
-any capture containing submatches.  Presumably each returned chunk would
-come equipped with some method to discover its "pedigree" in the parse tree.]
+captures.  However, sometimes you'd rather get a flat list of tokens
+in the order they occur in the text.  The C<.caps> method returns
+a list of every capture in order, regardless of how it was otherwise
+bound into named or numbered captures.  (Other than order, there is
+no new information here; all the elements of the list are the very
+same C<Match> objects that bound elsewhere.)  The bindings are actually
+returned as key/value pairs where the key is the name or number under which
+the match object was bound, and the value is the match object itself.
 
+In addition to returning those captured C<Match> objects, the
+C<.chunks> method also returns all the interleaved "noise" between
+the captures.  As with C<.caps>, the list elements are in the order
+they were originally in the text.  The interleaved bits are also returned
+returned as pairs, where the key is '~' and the value
+is a simple C<Match> object containing only the string, even if unbound
+subrules such as C<.ws> were called to traverse the text in the first
+place.  Calling C<.ast> on such a C<Match> object always returns a C<Str>.
+
+A warning will be issued if either C<.caps> or C<.chunks> discovers
+that it has overlapping bindings.  In the absence of such overlap,
+C<.chunks> guarantees to map every part of its matched string (between
+C<.from> and C<.to>) to exactly one element of its returned matches,
+so coverage is complete.
+
+[Conjecture: we could also have C<.deepcaps> and C<.deepchunks> that
+recursively expand any capture containing submatches.  Presumably the
+keys of such returned chunks would indicate the "pedigree" of bindings
+in the parse tree.]
+
 =item *
 
 All match attempts--successful or not--against any regex, subrule, or

Reply via email to