Re: RFC 268 (v1) Keyed arrays

Michael Maraist Thu, 21 Sep 2000 15:29:51 -0700
>     my/our @array :hashsyntax;
>
> would hide the definition of %array in the same way that
>
>     my/our %array
>
> would  hide a  prior definition  of %array.   And references  to %array
> would thenceforth actually be references to the keyed array @array.

I can see massive confusion from this.  It's bad enough that globbing allows
multiple data-types with the same name, and it's really bad that the casual
observer has to stretch their mind like crazy to interpret @$sym{qw(a b c)},
which non-intuitively changes context every character (scalar to ref to
array ref to hash-slice ref, etc ).  Then came pseudo-hashes, which
thankfully aren't used for too much other than objects (where there true
natures are somewhat hidden).  Now you're suggesting colliding globing
name-spaces.

What I think you're saying is that hashes and arrays would be totally merged
into a singular data-type.  That might work, but we'd be alienating existing
users that don't want to retrain but work along side perl6 developers.  I'm
not saying it's bad, just that I know it'll affect my work environment.

> The syntaxes
>
>     $foo['element']
>     $foo{element]

Typo.  Which was it $foo{element} or $foo[element]?

> So, starting with
>
>    my @foo:key; # empty array
>    $foo ['month'] = 10;  #  $#foo == 1, $foo[0] == 10
>    $foo ['day'] = 20;   # $#foo == 2, $foo [1] == 20
>    $foo ['year'] = 30;   # $#foo = 3, $foo [2] == 30
> We achieve an array with 3 elements.  There is a clear parallel between
> this and
>
>    my %foo;
>    $foo{'month'} = 10;
>    $foo{'day'} = 20;
>    $foo{'year'} = 30;
>
> However, the lookups for @foo are done at compile time, the lookups for
> %foo are done at runtime.

Ok, implementation problem.  Are you suggesting that we really implement
pseudo-hashes behind the scene?  You mention name-spaces, but afaik, these
are just normal perl-hashes (with special values).  This sounds more like
using pseudo-hashes behind the scenes.  The difference is that now you're
requiring a new symbolic hash to be generated for each ":key" array,
where-as pseudo-hashes allow you to reuse the same old hash (as used in
perl5.005's OO).

As an alternative, use an attribute that does not interpolate the field
name.  Make it more like a c-struct, where you _have_ to fully specify the
field name.  With this, you could still use the dynamic generation of fields
as in the above (so you don't have to specify them ahead of time), but by
the time the compiler is finished, the references are fixed and converted to
array indexes.  Moreover, you have a special type of fixed array (as opposed
to a resizable one).  You get all the benifits of space / performance of
c-structures, while not totally throwing away the flexibility of hashes.
The only way you could achieve a dynamic field-name lookup would be through
an eval.  This slows it's usage down, but if you really wanted to use
hashes, you'd use some other variable-name attribute.  Actually, thinking
more on that, you couldn't totally fix the array size if you could 'eval'
the structure with additional field-names.  The most you could say would be
that you have a unidirectionally growing array, and it might not be worth
the while of producing a whole new array structure for just this sort of
optimization.

>
> For :key and :initialkey arrays, the syntax
>
>     $foo[$bar]
>
> would inspect $bar to determine if it is convertable to numeric.  If it
> is, the value is used as the numeric index of the array.  If it is not,
> it is treated as a key for the array, and is looked up in the namespace
> of the array.

Doesn't this conflict with the general usage of hashes?  I know I've made
use of hashes as a form of sparce matrix / array.  DB ID's, for example
(where you have 3 sequential values starting at 1,234,567,890 ).  Basing
context on whether the number 'happens' to be numeric is bad (at least for
me in DB programming).  I don't like the idea of coersing it to a string as
in $foo[ "$bar" ].

Also, if you ever mixed keys / indexes, you'd have a serious problem with
code maintanance.  Try the following:

sub foo {
my @hash: bla bla;
$hash{sym2} = 5;
$hash[0]++;
$hash{sym1} = 6;
...
}

Then later we add a debug routine (for what-ever reason)
sub db

   my $var = shift;
  print "DEBUG: sym1 = $var->{sym1}, sym2 = $var->{sym2}";
}
sub foo {
  my @hash: bla bla;
  db( \@hash );
  $hash{sym2} = 5;
  $hash[0]++;           #gotcha, idx0 == sym1 now
  $hash{sym1} = 6;
}

This is obviously contrived, but you get the exact same thing in the
Database world:

my @db_results = $dbh->selectcol_array( 'select * from table' );

Here, we're at the mercy of the order that the columns were defined.  If we
later add a new column anywhere other than the end, then we've invalidated
the above code.  You might say that it's not our fault.  But I offer that
during the development phase, the structure and contents change often (as
I've learned on my own).  Instead you need to either explicitly state the
fields as in:
my @db_results = $dbh->selectcol_array( 'select field1, field2 from
table' );

Or, associate a hash as with:
my $sth = $dbh->prepare( 'select * from table' );
...
my $rh_results = $sth->fetchrow_hashref();

So, either need to never numerically access index fields, or define
positional fields with the variable declaration.  Anything else will
fundamentally hurt the stability of perl-code.  I don't mind dynamic this or
that, so long as a lazy developer's module doesn't hurt my code any more
than it has to.

Also, is it sane to suggest that I could do the following?
my @hash: bla bla bla;
$hash[ 100 ]  = ...
$hash{ field1 }  = ...
$hash{ field2 }  = ...

In general, intermixing positional parameters and a dynamically assigned
hash is asking for trouble.

>
>    my ( @stat_array ) = stat ( $filename );
>    print "File $filename has a size of $stat_array[size] bytes.\n";

This could still work with a behind-the scenes pseudo-hash.  So long as stat
defined the structure.
Of course, you'd have to change the definition of stat to use want-array and
perform the following:
my $rh_stats = stat( .. )
print "File $filename has a size of $rh_stats->{size} bytes.\n";

Obviously, the dereferencing is annoying.  I'm sure there have been many
discussions on pro's and cons of pseudo-hashes, so I won't officially
suggest that you can actually hide the first index of a pseudo-hash, or even
that you can make a real hash out of it as in:

my %fast_hash: keys( size name ... );
or (or indirectly via the use of the attributes module).  I'll read through
this newsgroup to see if I can find out more about this style since it's
starting to attract me.

I do, however, like the suggestion that positionally significant parameters
should universally make use of what-ever optimized hash-like interface
finally gets adopted by perl6.



As a general comment, one of the benifits of using context modifiers like %,
$, @, [ and { was that a developer could look at an exported variables and
generally figure out what was going on.  If we use attributes to define the
characteristics of a variable, then we're looking at a maintanance
nightmare.

I can see the use of:
my $x: integer;
especially since it is of local scope.  More-over, that attribute might only
be a suggestion to perl, and on first violation ( $x = "hi" ), it could
throw away that attribute, so that we don't get into problems with returned
values (and references).

The most common issue that I can imagine would be returned values:
sub foo { .... return wantarray ? @array_with_attributes :
\@array_with_attributes }
my @array = foo();  # no attributes are applied, we're at the mercy of
default settings.
my $ra_array = foo();
$ra_array->[ num or sym ].  # depending on specific attributes, this might
act differently than expected


We can't do any sort of compile-time checking on references.  (this is why
type-checking required 'my CLASS $foo').

Additionally, to work properly, you'd have to store run-time attributes and
meta-data within the target array.  This can be completely non-obvious to
the user of a module.  I just see that you've returned a scalar ref or an
array.  But am I supposed to somehow figure out what field positions you've
used.
Pseudo hashes solved this problem by passing the structure along with the
array, and by requring the use of references.  Your method could return the
contents of the array / hash, and thus loose information about it.

In short, you've come up with a generic solution to do what pseudohashes
tried, but I think you're ignoring some of the fundamental problems that had
to be addressed by pseudo-hashes.  I don't think this is robust enough of an
idea.

-Michael
p.s. Well it IS, after all, a REQUEST for comments. :)
Re: RFC 268 (v1) Keyed arrays

Reply via email to