RFC 90 (v4) Arrays: merge() and unmerge()

Perl6 RFC Librarian Thu, 21 Sep 2000 12:59:26 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Arrays: merge() and unmerge()

=head1 VERSION

  Maintainer: Jeremy Howard <[EMAIL PROTECTED]>
  Date: 10 Aug 2000
  Last Modified: 21 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 90
  Version: 4
  Status: Frozen

=head1 DISCUSSION

Two major issues were discussed. One was that merge and unmerge were not
deserving of being placed in the core. In the end this is a judgement
call, but merge() in particular is fundamental to programming in a
functional style, to manipulating multidimensional matrices, and for
iterating through multiple arrays simultaneously. Furthermore,
implementing optimised aliasing behaviour as proposed may not be possible
in a module (depending on what extension mechanisms are in Perl 6).

The other issue discussed was whether the aliasing behaviour is
appropriate, and achievable. A section has been added to this RFC
discussing this.

=head1 ABSTRACT

It is proposed that two new functions, C<merge>, and C<unmerge>, be added
to Perl. C<merge(@list1, @list2, ...)> would return a list that
interleaved its arguments. C<unmerge($num_lists, @list)> would reverse
this operation. Both functions would return an alias into the original
list, not a copy of the elements.

=head1 CHANGES

=head2 Since v3

=over 4

=item *

Name change from C<demerge> to C<unmerge>

=item *

Added discussion of options for aliasing behaviour

=back

=head2 Since v2

=over 4

=item *

Described aliasing behaviour of C<merge> and <unmerge>

=back

=head2 Since v1

=over 4

=item *

Moved list to [EMAIL PROTECTED]

=item *

Changed name from zip/unzip

=item *

Pass lists directly in examples, not references

=item *

Change 2nd argument from $list_size to $num_lists

=back

=head1 DESCRIPTION

It is proposed that Perl implement a function called C<merge> that
interleaves the arguments of arrays together, and is evaluated lazily.
For instance:

  @a = (1,3,5);
  @b = (2,4,6);
  @merged_list = merge(@a,@b);   # (1,2,3,4,5,6)

This makes it easy to operate on multiple lists using flexible reduction
functions:

  $sum_xy = sub {reduce ^last+^x*^y, merge($_[0], $_[1])};
  print $sum_xy->(@a, @b);   # Prints '44', i.e. 1*2+3*4+5*6

In order to reverse this operation we need an C<unmerge> function:

  @merged_list = merge(@a,@b);   # (1,2,3,4,5,6)
  @unmerged_list = unmerge(2, @merged_list);   # ([1,3,5], [2,4,6])

The second argument to C<unmerge> is the number of lists that are to be
created (i.e. the number of lists that would have been C<merge>d for this
to reverse the operation).

If the list to be unmerged is not an exact multiple of the partition size,
the final list references are not padded--their length is one less than
the list size. For example:

  @list = (1..7);
  @unmerged_list2 = unmerge(3, @list);   # ([1,4,7], [2,5], [3,6])

Both C<merge> and <unmerge> do not make a copy of the elements of their
arguments; they simply create an alias to them:

  @a = (1,3,5);
  @b = (2,4,6);
  @merged_list = merge(@a,@b);   # (1,2,3,4,5,6)
  $merged_list[1] = 0;
  @b == (0,4,6);                 # True
    
=head1 IMPLEMENTATION

The C<merge> and C<unmerge> functions should be evaluated lazily.

C<merge> and C<unmerge> return an alias into the original list, not a copy
of the elements.

Effectively, C<merge> creates an iterator over multiple lists. If used as
part of a reduction, the actual interleaved list need never be created.
For instance:

  $sum_xy = sub {reduce ^last+^x*^y, merge($_[0], $_[1])};
  $answer = $sum_xy->(@a, @b);
  
should be evaluated as if it read:

  $answer = 0;
  $answer += $a[$_] * $b[$_] for (0..$#a-1));
  
which does not need to create an intermediate list.

=head2 Aliasing implementation

The proposed aliasing behaviour is also proposed for part()/flatten() (see
L<RFC 91>) and reshape() (see L<RFC 148>). This requires some more
thought. Perl's current slicing operation C<@a[$x1, $x2]> only aliases
when used in an lvalue context:

  @a = (4,5,6);
  @b = @a[1,2];
  @b[0] = 9;       # @a == (4,5,6)
  @a[1,2] = (8,9); # @a == (4,8,9)

We could do the same for merge() and friends. The downside is that:

  @transpose = part(
    # Find the size of each column
    scalar @list_of_lists,
    # Interleave the rows
    merge(@list_of_lists);
  )

and similar expressions would do an awful lot of copying. Ideally if
merge() didn't alias in an rvalue context, Perl would still optimise
away multiple merge()s, part()s, slices, and so forth so that only one
copy occurred.

If the aliasing behaviour is implemented, then assigning an aliased array
to another array should result in a copy being created:

  @a = (1,3,5);
  @b = (2,4,6);
  @merged_list = merge(@a,@b);   # Just an alias into @a and @b
  @copied_list = @merged_list;   # Does an actual array copy

Alternatively, a copy-on-write optimisation could allow some of the
efficiency of full aliasing to be combined with the simplicity of Perl 5's
alias-in-lvalue behaviour.

=head1 REFERENCES

RFC 23: Higher order functions

RFC 76: Builtin: reduce

RFC 91: Arrays: part() and flatten()

RFC 148: Arrays: Add reshape() for multi-dimensional array reshaping
RFC 90 (v4) Arrays: merge() and unmerge()

Reply via email to