Author: larry
Date: Sat Dec 23 18:37:23 2006
New Revision: 13502

Modified:
   doc/trunk/design/syn/S03.pod

Log:
More attempts to make smartmatch semantics consistent with multiple dispatch.
Regex matching now does not "autogrep" an Array since we're trying to
extend Regex matching to sequences of other than just Str.
Defined cat() in scalar context to fake up a lazy string implementation type.
(Still basically a no-op in list context, since lists are already lazy.)


Modified: doc/trunk/design/syn/S03.pod
==============================================================================
--- doc/trunk/design/syn/S03.pod        (original)
+++ doc/trunk/design/syn/S03.pod        Sat Dec 23 18:37:23 2006
@@ -14,7 +14,7 @@
   Date: 8 Mar 2004
   Last Modified: 23 Dec 2006
   Number: 3
-  Version: 81
+  Version: 82
 
 =head1 Changes to Perl 5 operators
 
@@ -608,52 +608,58 @@
     .{Any} .<string> .[number]  .method
     Class Subset Enum Role      Type
     Subst                       Regex
-    Buf Char                    Str
+    Buf Char LazyStr            Str
     Int UInt etc.               Num
 
 Note that all types are scalarized.  Both C<~~> and C<given>/C<when>
 provide scalar contexts to their arguments.  (You can always
 hyperize C<~~> explicitly, though.)  So both C<$_> and C<$x> here
-are potentially container objects.  The first possible match in this
-table is used.  By definition all normal arguments can be matched to
-at least one of these entries.
+are potentially container objects.  The first section contains
+privileged syntax; if a match can be done via one of those entries,
+it will be.  Otherwise the rest of the table is used, and the match
+will be dispatched according to the normal rules of multiple dispatch;
+however, the optimizer is allowed to assume that no C<< infix:<~~> >>
+operators are added at run time, so if the argument types are evident
+at compile time, the jump table can be optimized.  By definition all
+normal arguments can be matched to at least one of the entries below.
 
-    $_      $x        Type of Match Implied    Matching Code
+    $_      $x        Type of Match Implied    Match if
     ======  =====     =====================    =============
-    Any     Code:($)  scalar sub truth         match if $x($_)
-    Any     .method   method truth*            match if $_.method
-
-    Hash    Hash      hash keys identical sets match if $_.keys === $x.keys
-    Hash    Array     hash value slice truth   match if $_{any(@$x)}
-    Hash    Junction  hash key slice existence match if $_.exists($x)
-    Hash    Regex     hash key grep            match if any($_.keys) === /$x/
-
-    Array   Array     arrays are comparable    match if $_ »===« $x
-    Array   Junction  list intersection        match if any(@$_) === $x
-    Array   Regex     array grep               match if any(@$_) === /$x/
-    Array   Num       array contains number    match if any($_) == $x
-    Array   Str       array contains string    match if any($_) eqv $x
-    Array   Buf       array equivalent to buf  match if $_ eqv Array($x)
-    Array   Set       array equivalent to set  match if Set($_) === $x
-
-    Num     NumRange  in numeric range         match if $min <= $_ <= $max
-
-    Code    Signature signature compatibility* match if $_ is a subset of $x
-  Signature Signature signature compatibility  match if $_ is a subset of $x
-
-    Hash    Any       hash entry existence     match if exists $_{$x}
-    Array   Any       array contains item*     match if any($_) === $x
-    Any     Signature parameter binding        match if $_ can bind to $x
-    Any     Range     in range                 match if [!after] 
$x.min,$_,$x.max
-    Any     Type      type membership          match if $_.does($x)
-    Any     Regex     pattern match            match if $_ ~~ /$x/
-    Any     Num       numeric equality         match if $_ == $x
-    Any     Str       string equality          match if $_ eqv $x
-    Any     Code:()   simple closure truth*    match if $x() (ignoring $_)
-    Any     boolean   simple expression truth* match if $x.true given $_
-    Any     undef     undefined                match unless defined $_
-    Any     *         default                  match anything
-    Any     Any       run-time dispatch        match if infix:<~~>($_, $x)
+    Any     Code:($)  scalar sub truth         $x($_)
+    Any     .method   method truth*            $_.method
+    Any     boolean   simple expression truth* $x.true given $_
+    Any     undef     undefined                not defined $_
+    Any     *         default                  True
+
+    Num     Num       numeric equality         $_ == $x
+    Num     Junction  numeric equality         $_ == $x
+    Str     Str       string equality          $_ eqv $x
+    Str     Junction  string equality          $_ eqv $x
+
+    Hash    Hash      hash keys identical sets $_.keys === $x.keys
+    Hash    Array     hash value slice truth   $_{any(@$x)}
+    Hash    Junction  hash key slice existence $_.exists($x)
+    Hash    Regex     hash key grep            any($_.keys) === /$x/
+
+    Array   Array     arrays are comparable    $_ »===« $x
+    Array   Regex     match array like string  cat(@$_) ~~ $x
+    Array   Junction  list intersection        any(@$_) ~~ $x
+    Array   Num       array contains number    any($_) == $x
+    Array   Str       array contains string    any($_) eqv $x
+    Array   Buf       array equivalent to buf  $_ eqv Array($x)
+    Array   Set       array equivalent to set  Set($_) === $x
+
+    Code    Signature signature compatibility* $_ is a subset of $x
+  Signature Signature signature compatibility  $_ is a subset of $x
+
+    Hash    Any       hash entry existence     exists $_{$x}
+    Array   Any       array contains item*     any($_) === $x
+    Any     Signature parameter binding        $_ can bind to $x
+    Any     Range     in range                 [!after] $x.min,$_,$x.max (etc.)
+    Any     Regex     pattern match            $_.match($x)
+    Any     Type      type membership          $_.does($x)
+    Any     Code:()   simple closure truth*    $x() (ignoring $_)
+    Any     Any       run-time dispatch        infix:<~~>($_, $x)
 
 Matches marked with * are non-reversible, typically because C<~~> takes
 its left side as the topic for the right side, and sets the topic to a
@@ -702,17 +708,20 @@
 
 The C<~~> operator is intended primarily for compile-time resolution,
 and if the types of the operands resolve at compile time according
-to the table above, any existing C<< infix:<~~> >> routines are
+to the table above, any C<< infix:<~~> >> routines added later are
 completely ignored.  If the types cannot be matched at compile time,
-one attempt is made to multiply dispatch to all C<< infix:<~~> >>
-infix definitions.  The standard C<< infix:<~~> >> definitions are
-intended to reproduce as closely as possible the compile-time table above,
-but it can do this based only on the run-time types of the arguments.
-Therefore only the entries above that indicate a type on both sides
-can be dispatched that way.  (You can tell those because both sides
-start with a capital letter.  So multiple dispatch ignores the
-".method", "boolean", "undef", and "*" entries, which are recognized
-syntactically, not by type.)
+(that is, if the arguments match only the Any/Any rule at compile
+time), the match is deferred to a true run-time multple dispatch to
+all C<< infix:<~~> >> infix definitions that exist at the moment.
+
+The run-time C<< infix:<~~> >> definitions are intended to reproduce
+as closely as possible the compile-time table above, but it can do
+this based only on the run-time types of the arguments.  Therefore
+only the entries above that indicate a type on both sides can be
+dispatched that way.  (You can tell those because both sides start
+with a capital letter.  So multiple dispatch ignores the ".method",
+"boolean", "undef", and "*" entries in the first section, which are
+recognized syntactically, not by type.)
 
 If there is no appropriate signature match under the rules of multiple
 dispatch, the most generic multi definition of C<< infix:<~~> >>
@@ -733,6 +742,36 @@
 the replication count of those unique keys.  (Obviously, a C<Set> can
 have only 0 or 1 replication because of the guarantee of uniqueness).
 
+The C<LazyStr> type allows you to have an infinitely extensible string.
+You can match an array or iterator by feeding it to a C<LazyStr>,
+which is essentially a C<Str> interface over an iterator of some sort.
+Then a C<Regex> can be used against it as if it were an ordinary
+string.  The C<Regex> engine can ask the string if it has more
+characters, and the string will extend itself if possible from its
+underlying interator.  (Note that such strings have an indefinite
+number of characters, so if you use C<.*> in your pattern, or if you
+ask the string how many characters it has in it, or if you even print
+the whole string, it may be feel compelled to slurp in the rest of
+the string, which may or may not be expeditious.)
+
+The C<cat> operator in scalar context takes a (potentially lazy) list
+and returns a C<LazyStr> object, so you can search a gather like this:
+
+    my $lazystr := cat gather for @foo { take .bar }
+
+    $lazystr ~~ /pattern/;
+
+The C<LazyStr> interface allows the regex to match element boundaries
+with the C<< <,> >> assertion, and the C<StrPos> objects returned by
+the match can be broken down into elements index and position within
+that list element.  If the underlying data structure is a mutable
+array, changes to the array (such as by C<shift> or C<pop>) are tracked
+by the C<LazyStr> so that the element numbers remain correct.  Strings,
+arrays, lists, sequences, captures, and tree nodes can all be pattern
+matched by regexes or by signatures more or less interchangably.
+However, the structure searched is not guaranteed to maintain a C<.pos>
+unless you are searching a C<Str> or C<LazyStr>.
+
 =head1 Meta operators
 
 Perl 6's operators have been greatly regularized, for instance, by

Reply via email to