Re: RFC 231 (v1) Data: Multi-dimensional arrays/hashes and slices

Jeremy Howard Sat, 16 Sep 2000 01:09:33 -0700
Ilya Zakharevich wrote:
> On Sat, Sep 16, 2000 at 11:08:18AM +1100, Jeremy Howard wrote:
> >  - How does it relate to RFC 204? Is it an alternative, or an addition?
>
> 204 cannot be implemented since it prohibits usage of overloaded
> objects as array indices.
>
Why is it important for overloaded objects to be used as array indices? Why
does RFC 204 rule that out? RFC 204 simply specifies that a list reference
as an index provides multidimensional access:

  $a[ [1,1] ] == $a[1][1];

> >  - How does it relate to RFC 81? The semantics of '..' seems to
conflict.
>
> What I say conserns the usage of '..' inside an index only.  It cannot
> conflict with anything else.
>
RFC 81 expands on the existing operator '..' in a list context to allow more
generic list generation. It is particularly useful to generate lists to act
as array slices:

  @a[ 1..5 : 3] == @a[1,3,5];

This would seem to conflict with the meaning of '..' outlined in RFC 231.

> >  - Why is it better to make ';' "special inside a hash/array index only"
>
> Because ',' is already special there.  There is little chance that ';'
> operator is created as a general-purpose operator.
>
When we first discussed ';' on the list, we looked at making it special in
an index only. But the more generic approach of making it a cartesian
product operator seems cleaner--it avoids 'special' meanings in favour of
providing a generic operator.

Why is there little chance of creating ';' as a general-purpose operator?

> >  - Why is a special token for a separator necessary "to avoid the
(giant)
> > overhead of creation of anonymous arrays"? Don't RFC 203 arrays and RFC
> > 81/205 lazy generation avoid this?
>
> a) "Lazy generation" is not defined, as stated it is a good wish only.
>    What is
>
>      @a = (0, 2..99, 200..9998, 1000000);
>      f(@a);
>
Lazy generation is a well understood concept in other languages. I'm most
familiar with C++, so I'll draw from that. In libraries that provide lazy
evaluation, f(@lazy_list) is a 'promise' to apply f() to the elements of
@lazy_list when an element of f(@lazy_list) needs to be calculated.
Sometimes this is all done at runtime (MTL, newmat), sometimes parts are
done at compile time ('expression templates' in POOMA and Blitz++). These
C++ examples and many others are indexed at:

  http://www.oonumerics.org/oon/

> b) The call for $a[2,3;5,6] is
>
>   *) Put already-available SV pointers for $a, 2,3,4,5 and the cashed SV*
>      for tie::multi::separator() on stack;
>
>   *) Put the (cached) CV* for the method on stack;
>
>   *) invoke the call frame;
>
> This is not *very* quick, but at least it may be "not that slow".
> While all the alternatives require creation of anonymous lists, which
> (I expect) will slow things down 7..10 times for the call above.  For
> $a[1..100;1..100] it may easily be 100..1000 times slower.
>
Lists of lists of known simple type are proposed by RFC 203 to be stored as
true arrays (i.e. contiguously in memory). Their overhead is not the same as
Perl 5 lists of lists.

The index in $a[1..100;1..100] should be generated lazily. An individual
element can be calculated directly from the index parameters as required.

> Your way was my way when I was designing Math::Pari.  When I
> *implemented* Math::Pari, it took some time to determine why it was so
> much slower than what I expected.  My proposal is based on this
> experience.
>
> Creation of [1,2,3] is *very* slow.
>
I hope we can change how [1,2,3] is created by:

 - Creating a true numeric array if it is an array of known simple types
 - Generating the elements lazily where it is more efficient to do so

If we can not do these, then I agree that RFCs 204 and 205 are not plausible
in their current form.

> >  - Overall, what is the problem in the existing array RFCs that this is
> > designed to solve?
>
> *) They are not compatible with overloading (unless overloaded things
>    are dramatically changed);
>
There are a number of RFCs proposing substantially changing overloading.
What specific changes would we need to ensure were incorporated in P6 to
avoid this incompatibility?

> *) They create a lot of temporary anonymous arrays the only purpose of
>    which is to group arguments;
>
Yes, if we can't get any lazy generation to work.

> *) They go very high on the bizzareness scale.
>
Bizzare??? Which RFC?

RFC 82: The concept of all array operations being applied element-wise to
arrays is very widely used in languages oriented to numeric programming--it
is certainly not 'bizzare'. There has been debate around '||' and '&&',
although I find the alternative meaning of these in a list context proposed
by RFC 45 more bizarre. ...But I think that this point is already well
discussed...

RFCs 90 and 91: These builtins are in almost all languages with rich array
functionality. 'merge' and 'demerge' are more frequently called 'zip' and
'unzip', but those terms were almost universally rejected on -language.

RFC 203: If we know that a list of lists is of a simple type, why not store
it efficiently? And why not add an attribute to give us optional bounds
checking?

RFC 204: Isn't it fairly intuitive that:

  $a[ [1,1] ] == $a[1][1];

and given that @a[1,2] is a list slice, shouldn't then

  @a[ [1,1], [2,2] ] == ($a[1][1], $a[2][2]);

RFC 205: When first proposed, everyone on this list felt that:

  @a[ 1..3 ; 1..3 ]

is a fairly intuitive way of writing a 2d list slice, since it is in line
with how most other languages write slices. The observation that here ';' is
acting to create a cartesian product leads to the generalisation to any list
context.

I think the level of consensus achieved with the syntax proposed in RFCs 81,
204, and 205 speaks volumes. People on this list have a very wide variety of
backgrounds, in terms of preferred languages. The toing and froing involved
to reach concensus has been substantial. If there are implementation
challenges involved in these RFCs then a contingency plan should seek to
match the actual syntax of these RFCs as closely as possible.

RFC 206: If $#list gives the upper bound of @list, there's nothing bizarre
about @#array giving the upper bounds of a multidimensional array.

RFC 207: This is the closest array proposal to deserving a 'bizarre' tag.
I'm not sure that we've really sorted this yet, although I think we all
agree that _some_ efficient way of looping through an array is desirable.

> >  - Can we incorporate a solution into the existing RFCs without creating
a
> > new conflicting one?
> >
> > If there are implementation challenges around the existing RFCs, I would
> > rather make changes required to overcome them within those RFCs.
>
> I see no way how the existing RFC can be accepted.
>
We've been discussing the various -data RFCs for a few weeks now, trying to
resolve the many issues that have come up. On three occassions I've posted
summaries of key points from these discussions on -internals and requested
input on which parts we should rule out because they can not be implemented.
On each occassion I've had a positive response. We are now aiming to freeze
these RFCs by Wednesday to give Larry reasonable time to consider them
before his release of the draft spec.

Ilya, are you saying that we've all wasted all the time that we've put into
these RFCs? When you say that the existing RFC can not be accepted, are you
referring to 203, 204, 205, or some other proposal or group of proposals?
How do you suggest that we proceed, given that Larry is announcing his
initial decisions in just two weeks? Our plan on this list had been to
attempt to put forwards a set of RFCs which contain our consensus views.
This has already involved numerous RFCs being either withdrawn or combined.
Where we can not reach consensus, we have been planning to pull out only
those specific parts which conflict and create cross-referencing RFCs for
each. If we're going to incorporate RFC 231 into this process then we've got
a lot of work to do over the next few days!

> (No, I could not
> read the "include all the PDL" proposal to the end, so I cannot
> comment on this.)
>
There is no 'include all the PDL' proposal. RFC 116 was in written in
response to a request from Jarkko to specify how PDL currently works, in
order to ensure that learning from that project is captured. It simply
describes what PDL currently does, and does not propose to simply include it
in Perl 6. Personally, I really appreciate the effort of the PDL Porters
team in writing this--I found it a very informative document. I have asked
PDL Porters to clarify the role of this document in the next version so that
it is completely clear that it is simply meant as a backgrounder, not a
proposal.

> > That we we
> > get the benefit of the thought we've all put into the syntax of these
RFCs,
> > plus the benefit of Ilya's deep understanding of Perl internals.
>
> Thank you for suggesting that I do not need to think to create a RFC.
>
That's not what I meant. I'm sorry if it came across like this.

Ilya, I'm grateful that you are putting your considerable experience and
intellectual horsepower into helping to design the data-crunching
capabilities of Perl 6. It's just going to be difficult to incorporate your
input so late in the process. It would be nice to at least get to a point
where the specific issues you're reacting to in the -data RFCs could be
explicitly mentioned in those RFCs, and then RFC 231 could be revised to
more thoroughly cross-reference these RFCs. This will make it much easier
for Larry to work through these proposals.
Re: RFC 231 (v1) Data: Multi-dimensional arrays/hashes and slices

Reply via email to