This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Arrays: Efficient Array Loops =head1 VERSION Maintainer: Buddha Buck <[EMAIL PROTECTED]> Date: 8 Sep 2000 Last Modified: 21 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 207 Version: 2 Status: Developing =head1 ABSTRACT This RFC proposes a notation for creating efficient implicit loops over multidimensional arrays. It introduces the notation |i for an index iterator for arrays, allowing temporary multidimensional arrays to be created on the fly. =head1 CHANGES Version 2 is an almost complete rewrite, based on concerns and enhancements that arose in discussion. I was also unhappy with the language in the original. The primary semantic changes are: * The syntax formerly known as "iterators" is now known as "looping indices" * |i is interpreted as "scalar-like", rather than "list-like". The examples are now (to the best of my knowledge) consistant with that interpretation. * The scoping mechanism is enhanced. It is now clearer what the iterators mean in the various contexts within perl. I believe the current version is much more understandable. =head1 DESCRIPTION Consider the problem of multiplying together two 2-dimensional tensors. In standard notation, this would be symbolized by Cijkl = Aij * Bkl where the letters i, j, k and l are written as subscripts and represent the indices of their respective tensors. To accomplish that same multiplication in Perl, one needs to write (using RFC 204 notation): for my $i = (0..2) { #assuming 3x3 tensors for my $j = (0..2) { for my $k = (0..2) { for my $l = (0..2) { $c[[$i,$j,$k,$l]] = $a[[$i,$j]] * $b[[$k,$l]] ; } } } } While this is not particularly difficult, it is clumsy, and slow. It could be hidden in a subroutine, but then one lacks the flexibility to write similar equations on the fly. For instance, transposition is easily written as Tij = Aji. Such notation is assumed to be true for all legal values of i and j -- in effect, it loops over all legal values of i and j to compute the results. Furthermore, such notation along with appropriate restrictions on use could allow Perl to create optimised loops. This RFC proposes a similar notation, using |name as a notation for looping indices like i and j above. =head2 Details This RFC proposes that the entire The prefix | signifies that |i is an "looping index" within the statement |i appears in. More than one looping index can appear within one statement. The "scope" of a set of looping indices is the smallest subexpression in the statement that contains all the looping indices in the statement. If the scope is not in list or void context, the scope is expanded until it is in list or void context. The scope of a set of looping indices will never exceed one statement. A set of looping indices generates an implicit loop (or nested loops) surrounding the scope of the indices. If used in list context, the implied loop creates a temporary array whose dimensions and bounds are determined by the number and range of the looping indices. =over 4 # In list context $dotproduct = reduce ^_+^_, 0, $a[|i] * $b[|i]; @tensorproduct = $a[[|i,|j]] * $b[[|k,|l]]; =back When there are multiple looping indices in an expression in list context, the order of the indices in the temporary array is determined by scanning the expression left-to-right. If used in a void context, the implied loop does not create a temporary array, but rather expects the side effects (if any) to do the real work. =over 4 # In void context $c[[|i,|j,|k,|l]] = $a[[|i,|j]]*$b[[|k,|l]]; # compare with loop above $t[[|i,|j]] = $a[[|j,|i]]; $product[[|i, |j]] = $a[[|i,|k]] * $b [[|k,|j]]; $dotproduct += $a[|i] * $b[|i]; # tmtowdi $stack = $a[|i]; # $stack->STORE is TIEd to $stack->push($) =back As the two example shows, assignment to a scalar is assumed to be in scalar context, not list, and so the scope of the looping indices is expanded to include the LHS of the assignment. Looping indices can also have a user-defined range, using the syntax "|i=@list". This would create a bounding loop of the form "for |i (@list) {...}" around the expression. When no list explicitly given, |i acts as if (0..) was the specified list. The range for a looping index can be specified anywhere in the statement, but only one range can be given per looping index. =over 4 # take the upper-triangle of a square array $uppertri[[|i,|j]] = 0; # clear array to begin with $uppertri[[|i,|j]] = $a[[|i,|j=(0..|i)]]; =back Strictly speaking, |i does take on all the values in @list (or (0..)), but rather solely those values which lead to valid (in bounds) array indices. =over 4 # "unriffle" two arrays. There are probably better ways to do this ($a[|i], $b[|i]) = ($c[ 2*|i ], $c[ 2*|i + 1 ]); $average[|i] = ($a[|i-1] + $a[|i] + $a[|i+1])/3; =back In the first example, |i will never take on values that would cause 2*|i+1 to be out of bounds for $c. As mentioned above, using multiple looping indices will cause a nested loop. The order of nesting the loops is not specified here, but any interdependencies among the indices must be satisfied. In most of the examples above, the loops caused by the multiple iterators are independent. However, in the "upper triangle" example, since the range of |j depends on the current value of |i, |i must be the "outer loop". In expressions containing looping indices and RFC205-style Cartesian-product array slices (e.g., $matrix[|x;@y]), each explicit (or implicit) non-singleton argument to ; acts as if it were an anonymous iterator using the explicit (or implicit (0..)) range list. Each anonymous iterator is independent of each other. This can be very powerfull, especially when combined with the * operand to ;: =over 4 # Generalized tensor multiplication: @product = $a[|i;*] * $b[|j;*]; =back The use of ; also makes it easier to express long lists of looping indices. $array[[|1,|j,|k]] is equivilant to $array[|i;|j;|k], but doesn't use as much punctuation. Looping indices aren't restricted to being used solely as array indices, as the "unriffle" example showed. But each looping index has to be used in an array index for at least one array. =over 4 # find $nth triangular number my $triangle = 0; $triangle += |i=(0..$n); # compile-time error: |i not used as index # Fill a multiplication table my @multtable : shape(12,12); $multtable[|i;|j] = |i*|j; # OK =back =head Lazy Evaluation Assuming that lazy evaluation is used in other parts of Perl6, it would be nice if these loops could also be evaluated lazily. In list context, this could be done by creating an anonymous function to evaluate the looped expression at the desired indices: =over 4 $a[|i]*$b[|j] # in list context # becomes sub { my ($i,$j) = @_; $a[$i]*$b[$j]; } =back This anonymous function can be TIEd to the resulting anonymous array, so all array lookups would invoke this function. Since TIEing is supposed to be improved in Perl6, this would be a reasonable way to do it. If other lazy evaluation mechanisms work in Perl6, they could be used instead. I am uncertain if lazy evaluation makes sense in void context. =head2 Examples: =over 2 $t[[|i,|j]] = $a[[|j,|i]]; # transpose 2-d @a =back would be equivilant to: =over 2 { my $i; my $j for $i (0..) { # last if out-of-bounds for $j (0..) { # last if out-of-bounds $t[[$i,$j]] = $a[[$j,$i]]; } } } =back This notation also allows (as a specific use) an alternative notation to the RFC 82 element-wise syntax. =over 2 #compute pairwise sum, pairwise product, pairwise difference... @sum = @a[[|i,|j,|k,|l]] + @b[[|i;|j;|k;|l]]; # RFC82: @sum = @a + @b @prod= @a[[|i,|j,|k,|l]] * @b[[|i;|j;|k;|l]]; # @prod = @a * @b @diff= @a[[|i,|j,|k,|l]] - @b[[|i;|j;|k;|l]]; # @diff = @a - @b =back RFC 82 syntax is simpler, but this is perl, so There Is More Than One Way To Do It. Note that if the "Lazy Evaluation" schema mentioned above is adopted, then these sums, products, and differences could be automagically lazy as well. =head1 IMPLEMENTATION The simplest implementation would be to convert at compile-time (or parse time) void-context looped iterator scopes to loops analogous to the above examples, and convert list-context looped iterator scopes to valued do-blocks or invoked anonymous subroutines: =over 4 $dotproduct = reduce {^_+^_},0,$a[|i]*$b[|i]; # would be transformed into $dotproduct = reduce {^_+^_},0, sub { my $i; my @r; for $i (0..min($#a,$#b)) { $r[$i] = $a[$i] * $b[$i]; } return @r; }->(); =back A more sophisticated, preferred, implementation would take advantage of the static, known nature of the data to create a highly optimized version of the loop. Possible optimizations include: Common sub-expression elimintation, encoding internally to some non-interpreted looping construct, etc. If special 'numeric functions' are provided in Perl, then expressions with just unoverloaded operators and numeric functions could be optimised into tight compiled loops, as occurs for example with fromfunction() and ufuncs in Numeric Python: http://starship.python.net/~da/numtut/array.html#SEC8 http://starship.python.net/~da/numtut/array.html#SEC13 For lazy evaluation, the value of the expression at any given set of indices is easy to calculate. However the lazy evaluation mechanism works, it can use this property to calculate the appropriate values. =head1 REFERENCES RFC 203: Notation for declaring and creating arrays RFC 204: Notation for indexing arrays with an LOL as an index RFC 205: New operator ';' for creating array slices