RFC 207 (v2) Arrays: Efficient Array Loops

Perl6 RFC Librarian Thu, 21 Sep 2000 15:41:36 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Arrays: Efficient Array Loops

=head1 VERSION

  Maintainer: Buddha Buck <[EMAIL PROTECTED]>
  Date: 8 Sep 2000
  Last Modified: 21 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 207
  Version: 2
  Status: Developing

=head1 ABSTRACT

This RFC proposes a notation for creating efficient implicit loops over
multidimensional arrays. It introduces the notation |i for an index
iterator for arrays, allowing temporary multidimensional arrays to be
created on the fly.

=head1 CHANGES

Version 2 is an almost complete rewrite, based on concerns and
enhancements that arose in discussion.  I was also unhappy with the
language in the original.  The primary semantic changes are:

* The syntax formerly known as "iterators" is now known as "looping
indices"

* |i is interpreted as "scalar-like", rather than "list-like". The
examples are now (to the best of my knowledge) consistant with that
interpretation.

* The scoping mechanism is enhanced.  It is now clearer what the
iterators mean in the various contexts within perl.

I believe the current version is much more understandable.

=head1 DESCRIPTION

Consider the problem of multiplying together two 2-dimensional tensors. In
standard notation, this would be symbolized by

    Cijkl = Aij * Bkl

where the letters i, j, k and l are written as subscripts and represent
the indices of their respective tensors. To accomplish that same
multiplication in Perl, one needs to write (using RFC 204 notation):

for my $i = (0..2) {    #assuming 3x3 tensors
   for my $j = (0..2) {
     for my $k = (0..2) {
       for my $l = (0..2) {
           $c[[$i,$j,$k,$l]] = $a[[$i,$j]] * $b[[$k,$l]] ;
       }
     }
   }
}

While this is not particularly difficult, it is clumsy, and slow. It could
be hidden in a subroutine, but then one lacks the flexibility to write
similar equations on the fly. For instance, transposition is easily
written as Tij = Aji. Such notation is assumed to be true for all legal
values of i and j -- in effect, it loops over all legal values of i and j
to compute the results. Furthermore, such notation along with appropriate
restrictions on use could allow Perl to create optimised loops.

This RFC proposes a similar notation, using |name as a notation for
looping indices like i and j above.

=head2 Details

This RFC proposes that the entire
The prefix | signifies that |i is an "looping index" within the
statement |i appears in.  More than one looping index can appear
within one statement.

The "scope" of a set of looping indices is the smallest subexpression
in the statement that contains all the looping indices in the
statement.  If the scope is not in list or void context, the scope is
expanded until it is in list or void context.  The scope of a set of
looping indices will never exceed one statement.

A set of looping indices generates an implicit loop (or nested loops)
surrounding the scope of the indices.

If used in list context, the implied loop creates a temporary array
whose dimensions and bounds are determined by the number and range of
the looping indices.

=over 4

     # In list context
     $dotproduct = reduce ^_+^_, 0, $a[|i] * $b[|i];
     @tensorproduct = $a[[|i,|j]] * $b[[|k,|l]];

=back

When there are multiple looping indices in an expression in list
context, the order of the indices in the temporary array is determined by
scanning the expression left-to-right.

If used in a void context, the implied loop does not create a
temporary array, but rather expects the side effects (if any) to do
the real work.

=over 4

     # In void context
     $c[[|i,|j,|k,|l]] = $a[[|i,|j]]*$b[[|k,|l]]; # compare with loop above

     $t[[|i,|j]] = $a[[|j,|i]];
     $product[[|i, |j]] = $a[[|i,|k]] * $b [[|k,|j]];
     $dotproduct += $a[|i] * $b[|i];  # tmtowdi
     $stack = $a[|i]; # $stack->STORE is TIEd to $stack->push($)

=back

As the two example shows, assignment to a scalar is assumed to be in
scalar context, not list, and so the scope of the looping indices is
expanded to include  the LHS of the assignment.

Looping indices can also have a user-defined range, using the syntax
"|i=@list".  This would create a bounding loop of the form "for |i
(@list) {...}" around the expression.  When no list explicitly given,
|i acts as if (0..) was the specified list.  The range for a looping
index can be specified anywhere in the statement, but only one range
can be given per looping index.

=over 4

     # take the upper-triangle of a square array
     $uppertri[[|i,|j]] = 0;  # clear array to begin with
     $uppertri[[|i,|j]] = $a[[|i,|j=(0..|i)]];

=back

Strictly speaking, |i does take on all the values in @list (or (0..)),
but rather solely those values which lead to valid (in bounds) array
indices.

=over 4

    # "unriffle" two arrays.  There are probably better ways to do this
    ($a[|i], $b[|i]) = ($c[ 2*|i ], $c[ 2*|i + 1 ]);
    $average[|i] = ($a[|i-1] + $a[|i] + $a[|i+1])/3;

=back

In the first example, |i will never take on values that would cause
2*|i+1 to be out of bounds for $c.

As mentioned above, using multiple looping indices  will cause a
nested loop.  The order of nesting the loops is not specified here,
but any interdependencies among the indices must be satisfied.

In most of the examples above, the loops caused by the multiple
iterators are independent.  However, in the "upper triangle" example,
since the range of |j depends on the current value of |i, |i must be
the "outer loop".

In expressions containing looping indices and RFC205-style Cartesian-product
array slices (e.g., $matrix[|x;@y]), each explicit (or implicit) non-singleton
argument to ; acts as if it were an anonymous iterator using the explicit (or
implicit (0..)) range list.  Each anonymous iterator is independent of
each other.  This can be very powerfull, especially when combined with
the * operand to ;:

=over 4

     # Generalized tensor multiplication:
     @product = $a[|i;*] * $b[|j;*];

=back

The use of ; also makes it easier to express long lists of looping
indices.  $array[[|1,|j,|k]] is equivilant to $array[|i;|j;|k], but
doesn't use as much punctuation.

Looping indices aren't restricted to being used solely as array
indices, as the "unriffle" example showed.  But each looping index has
to be used in an array index for at least one array.

=over 4

     # find $nth triangular number
     my $triangle = 0;
     $triangle += |i=(0..$n);   # compile-time error: |i not used as index

     # Fill a multiplication table
     my @multtable : shape(12,12);
     $multtable[|i;|j] = |i*|j; # OK

=back

=head Lazy Evaluation

Assuming that lazy evaluation is used in other parts of Perl6, it
would be nice if these loops could also be evaluated lazily.

In list context, this could be done by creating an anonymous function
to evaluate the looped expression at the desired indices:

=over 4

     $a[|i]*$b[|j]  # in list context
     # becomes

     sub { my ($i,$j) = @_; $a[$i]*$b[$j]; }

=back

This anonymous function can be TIEd to the resulting anonymous array,
so all array lookups would invoke this function.  Since TIEing is
supposed to be improved in Perl6, this would be a reasonable way to do
it.

If other lazy evaluation mechanisms work in Perl6, they could be used
instead.

I am uncertain if lazy evaluation makes sense in void context.

=head2 Examples:

=over 2

   $t[[|i,|j]] = $a[[|j,|i]];  # transpose 2-d @a

=back

would be equivilant to:

=over 2

   {
     my $i; my $j
     for $i (0..) { # last if out-of-bounds
       for $j (0..) { # last if out-of-bounds
         $t[[$i,$j]] = $a[[$j,$i]];
       }
     }
   }

=back

This notation also allows (as a specific use) an alternative notation
to the RFC 82 element-wise syntax.

=over 2
   #compute pairwise sum, pairwise product, pairwise difference...
   @sum = @a[[|i,|j,|k,|l]] + @b[[|i;|j;|k;|l]];  # RFC82: @sum  = @a + @b
   @prod= @a[[|i,|j,|k,|l]] * @b[[|i;|j;|k;|l]];  #        @prod = @a * @b
   @diff= @a[[|i,|j,|k,|l]] - @b[[|i;|j;|k;|l]];  #        @diff = @a - @b

=back

RFC 82 syntax is simpler, but this is perl, so There Is More Than One
Way To Do It.

Note that if the "Lazy Evaluation" schema mentioned above is adopted,
then these sums, products, and differences could be automagically lazy
as well.

=head1 IMPLEMENTATION

The simplest implementation would be to convert at compile-time (or
parse time) void-context looped iterator scopes to loops analogous to
the above examples, and convert list-context looped iterator scopes to
valued do-blocks or invoked anonymous subroutines:

=over 4

     $dotproduct = reduce {^_+^_},0,$a[|i]*$b[|i];

     # would be transformed into

     $dotproduct = reduce {^_+^_},0,
        sub { my $i; my @r;
              for $i (0..min($#a,$#b)) {
               $r[$i] = $a[$i] * $b[$i];
              }
              return @r;
            }->();

=back

A more sophisticated, preferred, implementation would take advantage
of the static, known nature of the data to create a highly optimized
version of the loop.

Possible optimizations include: Common sub-expression elimintation,
encoding internally to some non-interpreted looping construct, etc. If
special 'numeric functions' are provided in Perl, then expressions
with just unoverloaded operators and numeric functions could be
optimised into tight compiled loops, as occurs for example with
fromfunction() and ufuncs in Numeric Python:

   http://starship.python.net/~da/numtut/array.html#SEC8
   http://starship.python.net/~da/numtut/array.html#SEC13

For lazy evaluation, the value of the expression at any given set of
indices is easy to calculate. However the lazy evaluation mechanism works,
it can use this property to calculate the appropriate values.

=head1 REFERENCES

RFC 203: Notation for declaring and creating arrays

RFC 204: Notation for indexing arrays with an LOL as an index

RFC 205: New operator ';' for creating array slices
RFC 207 (v2) Arrays: Efficient Array Loops

Reply via email to