Re: Synopsis 9 draft 1

Larry Wall Fri, 03 Sep 2004 09:29:57 -0700

On Fri, Sep 03, 2004 at 10:00:24AM -0500, Jonathan Scott Duff wrote:
: On Thu, Sep 02, 2004 at 04:47:40PM -0700, Larry Wall wrote:
: >     my ref[Array] @ragged2d;
: 
: What is a "ref" type exactly? Is it like a pointer in C?


It's exactly like a reference in Perl 5.  Declaring a compact array of
"ref" is merely declaring that the array will only hold references
to Perl 6 data structures, and doesn't have to worry about holding
value types like int or num.  It may come out to the same thing as an
ordinary array, depending on how Parrot ends up defining references
internally.

: If so, and
: based on the parameterization above, I assume that there will also be
: the appropriate pointer arithmetic such that if $fido is declared as a
: ref[Dog] and pointed at an array of Dogs, then $fido++ will move to the
: next Dog in the array. Something like this:
: 
:       my Dog @pound;
:       my ref[Dog] $fido;
:       # after we've populated the @pound with Dogs ...
:       loop ($fido = @pound[0]; ?$fido; $fido++) {
:          $fido.bark();                
:       }

I don't see any more reason to allow that in Perl 6 than in Perl 5.

: Is there some other syntax to get a compact array of things?  Do I
: need an attribute on the array?

I don't know what you mean by "of things".

: > the presence of a low-level type tells Perl that it is free to
: > implement the array with "compact storage", that is, with a chunk
: > of memory containing contiguous (or as contiguous as practical)
: > elements of the specified type without any fancy object boxing that
: > typically applies to undifferentiated scalars.  (Perl tries really
: > hard to make these elements look like objects when you treat them
: > like objects--this is called autoboxing.)
: 
: Will it also try really hard to use compact storage or are the
: low-level types just compiler hints?  (As a first approximation, could
: all of the int types be implemented as Ints with appropriate
: serialization?)

I suspect the compiler will try harder for composite types than for
scalars, but certain algorithms will run much faster if they can use
Parrot integer registers rather than PMCs, so int and num scalars
are also likely to be implemented by the Perl 6 compiler directly.

: > To declare a multidimensional array, you add a shape parameter:
: > 
: >     my num @nums is shape(3);   # one dimension, @nums[0..2]
: >     my int @ints is shape(4;2); # two dimensions, @ints[0..3; 0..1]
: 
: Maybe it's just my BASIC upbringing, but "shape" doesn't seem like the
: right word.  Words like "dimension" and "cardinal" fit better in my
: head, but I'd want them shorter and "dim" and "card" don't quite work
: either ;-)
: 
: But "shape" makes me want to do something like this:
: 
:       my num @a is shape('triangle');
:       my num @b is shape('octagon');
:       my num @c is shape('square');
: 
: That might make sense for triangles, but not the others (unless
: I'm just suffering a failure of imagination)
: 
: "size" could even work though it's vague. Maybe even "basis" though
: that's not quite right either.  Or perhaps "extent"?
: 
: Anyway ...my two cents.  If "shape" is carved in stone, I'll live with
: it :)

I picked it only because that's what the PDL folks came up with
in their series of RFCs after their own round of discussions.
That doesn't mean we can't change it if we do come up with something
better.  But I rather like shape.  It's short, and not easily confused
with other Perl 6 concepts.

: > If you wanted that C<0..2> range to mean
: > 
: >     @nums[0;1;2]
: > 
: > instead, then you need to use that C<semi> we keep mentioning:
: > 
: >     @nums[semi 0..2]
: 
: If I had 
: 
:       @a = (0,undef,2); 
: 
: would 
: 
:       @nums[semi @a] 
: 
: be the same as
: 
:       @nums[0;*;2]
: 
: ?

Dunno.  I suspect we can allow

    @a = (0,*,2);

in the indirect form in any event.  But the '*' is also still negotiable.
As is the "semi", for that matter.

: > XXX It's not clear whether C<[EMAIL PROTECTED]> should return the size of the 
entire
: > array or the size of the first dimension (or the scalar value of the
: > entire array if it's really a zero-dimensional array!).  I've put
: > C<[EMAIL PROTECTED]> above just in case.
: 
: hmm
: 
: my int @a is shape(3;4;5);
: 
: [EMAIL PROTECTED];;] == 3   (same as [EMAIL PROTECTED], right?)
: [EMAIL PROTECTED];*;] == 4   (same as [EMAIL PROTECTED];*])
: [EMAIL PROTECTED];;*] == 5   
: 
: BTW, could these also be made to work (or something similar)?

That seems rather opaque to me.  Better is

    @a.shape[0] == 3
    @a.shape[1] == 4
    @a.shape[2] == 5

: my int @b;
: my int @a is shape(10;5;7);
: @b = @a[*:by(2);;]            # @b is now shape(5;5;7)
: @b = @a[;1,4;]                        # @b is now shape(10;2;7)
: @b = @a[;(*);]                        # @b is now shape(10;7)
: @b = @a[;;;*]                 # @b is now shape(10;5;7;1)
: @b = @a[;;;*5]                        # @b is now shape(10;5;7;5)

I don't like notation that uses null slices to mean everything,
because, as you can see, you end up with long sequences of delimiters,
and I think it's psychologically more valuable to make people count the
somethings than the nothings.  So I'm thinking to allow null slices
to mean "everything" only on the trailing slices out of convenience
so you can drop the trailing semicolons, especially when you don't
actually know the dimensionality.  But "everything" slices in front
need to use '*'.  So I'd write the above as:

    my int @b;
    my int @a is shape(10;5;7);
    @b = @a[*:by(2)]            # @b is now shape(5;5;7)
    @b = @a[*;1,4]              # @b is now shape(10;2;7)
    @b = @a[*;(*)]              # @b is now shape(10;7)
    @b = @a[*;*;*;*]            # @b is now shape(10;5;7;1)
    @b = @a[*;*;*;*5]           # @b is now shape(10;5;7;5)

I'm not sure I buy the last three of those, however.  My inclination
is to either make it illegal to use too many dimensions.  And the
middle one doesn't make any sense to me.  You can't just refuse to
slice the middle dimension, because you have multiple values, unless
you're thinking it turns into shape(10;7;5) instead.  But using parens
for something like that is a no-go, since a slice expression needs
parens for whatever grouping it's going to do.  * should mean the
same as (*), ((*)), (((*))), etc. for any definition of *.

Larry

Re: Synopsis 9 draft 1

Reply via email to