This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Lazily evaluated list generation functions


=head1 VERSION

  Maintainer: Jeremy Howard <[EMAIL PROTECTED]>
  Date: 10 August 2000
  Version: 1
  Mailing List: [EMAIL PROTECTED]
  Number: 81

=head1 ABSTRACT

This RFC proposes that the existing C<..> operator produce a lazily
evaluated list. In addition, a new operation C<:> is proposed that allows
for the generation of lazily evaluated lists based on any Perl expression.

This proposal only discusses these operators in a list context. The
current meaning of '..' in a scalar context is not affected.

=head1 DESCRIPTION

This RFC proposes that Perl incorporate a broader tool box of list
generation techniques:

=over 8

=item *

Lazy evaluation of generated lists

=item *

Introduction of C<:> to generate arbitary lists

=back

These techniques would allow programs written in Perl to follow a
structure familiar to programmers used to numerical programming
environments. It would provide a more compact notation for many common
mathematical algorithms, and give Perl important information to make key
optimisations.

=head2 Lazy evaluation of generated lists

The C<..> of previous Perls is a I<list generation operator>, which
creates a list based on its parameters:

  ($start..$stop); # ($start, $start+inc, $start+2*inc, ... $stop)

where 'inc' is 1 if $start<$stop, or -1 otherwise. The list is generated
as soon as it is declared. These makes some code rather inefficient:

  @a = (1..1000000);   # One million element list generated here
  print $a[999999];

creates a one million element list despite only using one element of it.
Under I<lazy evaluation>, elements of the list are only created when they
are required, and saved for later use. In the previous example only
$a[999999] would be calculated by interpolation (not sequentially) and
stored when using lazy evaluation.

Lists, whether generated lazily or not, are assumed to be I<stable>. That
is, the value of $a[999999] will be the same everywhere in a program,
unless @a itself is modified. This means that lazily evaluated lists
provide a handy notation for memoization, as we will see later. It is
proposed that once an element has been calculated in a list, that it is
cached for use later rather than recalculated each time.

Lazy list elements get calculated when they are output, or used in an
expression that is output. If list elements are not output then they are
never calculated.

=head2 Introduction of C<:> to generate arbitary lists

It is proposed that a new operator be added to Perl's list generation
arsenal, C<:>. The ':' character is chosen because it reflects standard
notation for array slicing, which is an important use of this operator.

C<:> is only meaningful when called in a list context, generating a lazily
evaluated list in one of 3 ways.

=over 4

=item 1. I<($start..$end:$step)>

Although earlier Perls could create ascending and descending lists
incrementing by one, other increments required an unwieldy map:

  @threes = map {3*$_} (1..5);   # (3,6,9,12,15)

which was also less than intuitive to those used to the simple slicing
notation of numerical programming languages such as Matlab and IDL.

This proposed use of C<:> is identical to C<..> without C<:>, except that
it increments by $step rather than 1. Specifically, returns a list
($start, $start+$step, $start+2*$step, ... $end). If $step does not go
into ($end-$start) exactly, the last element of the list is the largest
number in the series less than or equal to $end.

$step is any non-zero number (not necessarily an integer). $step is
optional--if it's missing then $step is the same as if the C<:> wasn't
there at all. For example:

  (1..5:2);   # (1,3,5)
  (1..5:);    # Same as (1..5)
  (1..-5:-2);  # (1,-1,-3,-5)
  (1..5:-2);  # () Empty list

This slicing notation is particularly useful for dealing with arrays,
matrices, and tensors:

  @matrix = (1,3,4,
             2,6,7);
  @column1of3 = (1..10000:3);   # (1,4,7,...10000)
  print sum(@matrix[@column1of3]);          # Prints 3
  @matrix2 = readBig3ColumnMatrixFromSomewhere();
  $column1Sum = sum(@matrix2[@column1of3]); # No need to redefine our slice!

Note that more complex slicing, masking, and indirection across
n-dimensional tensors would make the win of not having to respecify the
indexes more substantial than in this simplified example.

=item 2. I<($start:$end:$step)>

An alias for ($start..$end:$step), to keep things familiar for folks
moving over from over languages supporting similar notation.

=item 3. I<(@start:&gen:$num_steps)>

The most general form of this is to provide a syntax to create bounded or
unbounded lists whose elements are generated by any arbitary function:

  ($start, &gen($start), &gen(&gen($start)), {$steps times}...);

This is created using the notation:

  ($start:&gen:$steps);

As before, $start is required (but need not be an integer). &gen takes one
parameter, which is the value of the previous element. For example:

  @first5PowersOf2 = (1:sub {$_[0]**2}:5);   # (1,2,4,8,16)

Because lists are memoized automatically, when we later say:

  print $powersOf2[4];

The value of $powersOf2[4] is pulled from the memoization cache rather
than recalculated. What's more:

  print $powersOf2[5];

is calculated from $powersOf2[4], rather than having to recalculate all
the in-betweens again, because of that caching.

This form of list generation can not generate the Fibonnaci sequence,
because it is not defined by a single $start, and it has more than
one parameter in its generator function. This requires a more
generalised form:

  ((@start):&gen:$num_steps)

which allows us to say:

  @fib = ((1,1): ^_ + ^_: 5);   # (1,1,2,3,5)

if we use the higher-order function notation proposed in RFC 23.

=back

=head1 IMPLEMENTATION

Some list generation functions can be short-circuited. For instance,
although (1..100000:5) is just a shorthand for (1:^_+5:100000), if it was
implemented this way then:

  @a = (0..:5);
  print $a[99999];

would take an awfully long time to run and would memoize $a[0..99998],
which probably isn't ideal. So slices and normal lists (at the very least)
should know that {0+5+5+5...+5} can be quickly calculated as {5*n}. There
may be other functions that can short-circuit as special cases, but that
is starting to sound complex.

It will be common in numerical programming to see multiple layers of
indirection:

  @a = (1:5:100000);
  @b = getBigImage();
  @c = @a(@b);
  print $c[5];

In these cases Perl should consolidate as much as possible at compile time
to avoid too much overhead.

Dan Sugalski's comments on perl6-internals provide some insight about
potential lazy evaluation and memoization approaches:

=over 4

=item *

I was thinking we might keep a bitmap for used/unused cells (with unused
doubling as undef) and a two-level array pointing to chunks of real
elements/element pointers.

So, for example, if you did a:

    $foo[$elem];

perl would first check bit $elem to see if that element is in use. If
so, it'd do a ($elem / chunk_size) to get the index to the chunk
pointer in the array structure, then access element ($elem %
chunk_size) to get a pointer to the ultimate thing. (Or the thing
itself, if it were an int, say)

Other methods are possible, of course, and which would be a win depends on
your element distribution. (A linked list would be a win for very sparse
arrays--$foo[time])

=back

=head1 REFERENCES

RFC 24: Semi-finite (lazy) lists

Memoization in Perl:
  http://www.plover.com/~mjd/perl/MiniMemoize/

List comprehension in Haskell:
  http://www.numeric-quest.com/haskell/hcompanion/principles.html#List
comprehension
  Fethi Rabhi and Guy Lapalme: I<Algorithms: A functional programming
approach>,
    Addison-Wesley, 235 pages, paperback, 1999. ISBN 0-201-59604-0

=head1 ACKNOWLEDGEMENTS

  Damian Conway: Reviewed first draft
  Karl Glazebrook: Suggested splitting from infinite lists proposal
  Christian Soeller: Original 'slice' RFC; suggested argument reordering
  Dan Sugalski: Implementation ideas


Reply via email to