This and other RFCs are available on the web at
http://dev.perl.org/rfc/
=head1 TITLE
Keyed arrays
=head1 VERSION
Maintainer: Glenn Linderman <[EMAIL PROTECTED]>
Date: 20 Sep 2000
Mailing List: [EMAIL PROTECTED]
Number: 268
Version: 1
Status: Developing
=head1 ABSTRACT
The idea here is to merge the concepts of array and hash. The
resultant data structure is more costly than either, for some
operations, may be more efficient than hashes for some operations, and
may provide a more compatible solution for some types of problems than
current RFCs.
=head1 DESCRIPTION
These are not pseudo-hashes. Michael Schwern, author of RFC 241:
Pseudo-hashes must die! after some lengthy discussion, said: <quote> Ya
know, I'm going to say that $aref->[string] might be made to work where
pseudo-hashes failed. </quote>. However, that is not intended to mean
that Mr. Schwern endorses this proposal, but I did take it as
encouragement to develop this RFC to squeak in before the deadline and
hope that it lends just a bit credence to the proposal. That said...
=head2 Definition syntax
An array could have an optional namespace. Probably best to call these
keys, because it really is the same concept as hash keys--we look up
values using them. Probably there could be several flavors, as
described below. I'm not yet certain if all these flavors are useful,
or if all these flavors, even if useful, are worth the pain of
implementation, since I don't know the cost of implementation. There
may be flavors I haven't thought of yet, that would be worth
implementing. All but the first are generically called keyed arrays.
Here's my current concept list:
my/our @array :nokey;
my/our @array :key;
my/our @array :initialkey ( key0, key1, ... );
my/our @array :keyonly;
my/our @array :hashsyntax;
With the :nokey attribute, we would have a familiar perl<6 array. Use
numeric indexes, and [] indexing syntax.
The proposed default attribute would be :key, which would allow, but
not require, keys to be added to point to particular array elements,
and numeric indexing could still be used for speed.
The :initialkey variation would specify in the definition a list of
keys which would correlate to the array indexes 0, 1, ...
The :keyonly variation would be less efficient, and would require use
of keys for lookup. Of course, the compiler could translate fixed keys
into numeric indexes under the covers for performance.
The :hashsyntax variation is identical to the :keyonly variation,
except for the syntax, which is like hashes instead of like arrays.
This variation could unify the concepts of hash and array. The
definition of a :hashsyntax array should probably reserve a special
pointer in the symbol table so that the similarly named hash would be
automatically defined for access, but would actually refer to the
:hashsyntax keyed array. In other words, the definition of
my/our @array :hashsyntax;
would hide the definition of %array in the same way that
my/our %array
would hide a prior definition of %array. And references to %array
would thenceforth actually be references to the keyed array @array.
=head2 Reference syntax
The syntaxes
$foo['element']
$foo{element]
or for :hashsyntax arrays, either the above or
$foo{'element'}
$foo{element}
would be interpreted as a reference to @foo. If the namespace for @foo
contains 'element', that member of @foo is the interpretation. If the
namespace for @foo does not contain 'element', then 'element' is added
to the namespace for @foo, the size of @foo is increased by 1, and the
member 'element' refers to the newly added item in @foo.
So, starting with
my @foo:key; # empty array
$foo ['month'] = 10; # $#foo == 1, $foo[0] == 10
$foo ['day'] = 20; # $#foo == 2, $foo [1] == 20
$foo ['year'] = 30; # $#foo = 3, $foo [2] == 30
We achieve an array with 3 elements. There is a clear parallel between
this and
my %foo;
$foo{'month'} = 10;
$foo{'day'} = 20;
$foo{'year'} = 30;
However, the lookups for @foo are done at compile time, the lookups for
%foo are done at runtime.
For :key and :initialkey arrays, the syntax
$foo[$bar]
would inspect $bar to determine if it is convertable to numeric. If it
is, the value is used as the numeric index of the array. If it is not,
it is treated as a key for the array, and is looked up in the namespace
of the array.
For :keyonly and :hashsyntax arrays, all indexes are considered to be
string keys just like hash keys, requiring lookup in the namespace of
the array.
Perl uses some heuristic to decide whether a bareword within the {} of
a hash key reference is a function or a string, the same heuristic
should be applied within [] for keyed arrays.
Splice, push, pop, shift, and unshift are not valid for :keyonly and
:hashsyntax arrays, because they access data by index, which is
prohibited for those types of keyed arrays. Perhaps splice should be
prohibited on all keyed arrays, for performance concerns. If not
prohibited, its cost rises. See the implementation section.
The operations keys, values, and each would make sense for keyed
arrays. Their syntax should be extended to accept keyed arrays as
parameters, and the semantics should be similar to that for hashes. It
should be noted that values is not necessarily the same order as the
original keyed array, since it would iterate via the keys. For :key
and :initialkey arrays, it would not even necessarily be the complete
array, because not all array entries would necessarily have keys.
=head2 Usages
=head3 Garrulous builtins
(I cloned this heading from RFC 259, thanks Damian) (I cloned the list
of functions from RFC 37, thanks Jarkko) (I cloned the list of
standardized key names from RFC 259, thanks again, Damian)
The idea here is that some functions return an array of values that are
hard to keep track of. It would be nice to retain compatibility, but
also be able to access the members of the array by name, rather than
number. Since these arrays are of fixed size, and known values, we can
apply the :initialkeys variation to these functions. The functions
remain fully compatible with current definitions and usage of their
results, however, new code could be written using the names instead of
the numbers to access the results. One example, and then the cloned
lists of details.
my ( @stat_array ) = stat ( $filename );
print "File $filename has a size of $stat_array[size] bytes.\n";
=head4 List of garrulous builtins
caller [1]
getgrent
getgrnam
getgrgid
gethostbyaddr
gethostbyname
gethostent
getnetbyaddr
getnetbyname
getnetent
getprotobyname
getprotobynumber
getprotoent
getpwent
getpwnam
getpwuid
getservbyname
getservbyport
getservent
gmtime
localtime
stat
Some changes to this list may be necessitated by other changes to Perl6.
=head4 Standardized keys
The standardized keys for these functions would be:
=over 4
=item C<stat>/C<lstat>
'dev' Device number of filesystem
'ino' Inode number
'mode' File mode (type and permissions)
'nlink' Number of (hard) links to the file
'uid' Numeric user ID of file's owner
'gid' Numeric group ID of file's owner
'rdev' The device identifier (special files
only)
'size' Total size of file, in bytes
'atime' Last access time in seconds since the
epoch
'mtime' Last modify time in seconds since the
epoch
'ctime' Inode change time in seconds since the
epoch
'blksize' Preferred block size for file system I/O
'blocks' Actual number of blocks allocated
=item C<localtime>/C<gmtime>
'sec' Second
'min' Minute
'hour' Hour
'mon' Month
'year' Year
'mday' Day of the month
'wday' Day of the week
'yday' Day of the year
'isdst' Is daylight savings time in effect
(localtime only)
=item C<caller>
'package' Name of the package from which sub was
called
'file' Name of the file from which sub was
called
'line' Line in the file from which sub was
called
'sub' Name by which sub was called
'args' Was sub called with args?
'want' Hash of values returned by want()
'eval' Text of EXPR within eval EXPR
'req' Was sub called from a C<require> (or
C<use>)?
'hints' Pragmatic hints with which sub was
compiled
'bitmask' Bitmask with which sub was compiled
=item C<getpw*>
'name' Username
'passwd' Crypted password
'uid' User ID
'gid' Group ID
'quota' Disk quota
'comment' Administrative comments
'gcos' User information
'dir' Home directory
'shell' Native shell
'expire' Expiry date of account of password
=item C<getgr*>
'name' Group name
'passwd' Group password
'gid' Group id
'members' Group members
=item C<gethost*>
'name' Official host name
'aliases' Other host names
'addrtype' Host address type
'length' Length of address
'addrs' Anonymous array of raw addresses in 'C4'
format
=item C<getnet*>
'name' Official name of netwwork
'aliases' Other names for network
'addrtype' Type of network number
'net' Network number
=item C<getproto*>
'name' Official name of protocol
'aliases' Other names for protocol
'proto' Protocol number
=item C<getserv*>
'name' Official name of service
'aliases' Other names for service
'port' Port at which service resides
'proto' Protocol to be used
=head3 Accessor functions
The :keyonly and :hashsyntax variation would allow objects built out of
keyed arrays to use (multiple) inheritance, within the set of objects
built out of keyed arrays, of course. It seems hard (to me) to inherit
from an object that is a hash for an object that is an array, and vice
versa. If the unified :hashsyntax variation is implementable, one
could easily convert (with a pragma?) existing objects to use keyed
arrays instead of hashes, and thus gain any benefits for accessor speed
as noted next.
For objects built out of them, and if such objects implement a fair
number of accessor function (see RFC 163) which are effectively fixed
value keyed lookup and store operations, the compiler could translate
the fixed key values in the accessor functions to an index internally
for speed.
=head3 Pairs
I'm not yet sure how this RFC would interact with RFC 84. Clearly
there would be some ramifications, however, should both be adopted.
=head3 Sets
I'm not yet sure how this RFC would interact with RFC 179. Clearly
there could be some ramifications, however, should both be adopted.
=head3 Private/public
RFC 188 should interact with this RFC, if both are adopted. I see no
conflict to extending RFC 188 to use its keywords with keyed arrays,
with the same semantics on the names.
=head1 IMPLEMENTATION
One possible implementation of a keyed array would be to add to the
usual array implementation a name table via a hidden hash and a hidden
numeric value, called the offset, which is initially zero.
The purpose of the name table is to provide the lookup from name to
number. For each object, the name table would contain (key,
index-base) pairs that would provide the translation from key to
index-base. Index-base is added to offset to produce the index into
the array.
The purpose of the offset is to be added to the number that results
from a key-to-index translation before you actually lookup the value.
This allows operations such as shift and unshift, which affect the
relationship of all indexes and data values to be implemented without
updating each entry in the name table. If a later lookup returns an
index less than zero, an undef is returned. So shift would decrease
offset by the shift count, and unshift would increase offset by the
shift count. (The shift count is always 1 in perl<6, but see RFC 56.)
With the above in mind, the algorithm for inserting a new entry in the
name table, would be to create a pair consisting of (key, $# - offset)
and insert it into the name table for the keyed array.
Splice is a more complex beast. Some cases degenerate to shift,
unshift, push, or pop. If not, the name table for the keyed array must
be updated to reflect the adjustment to the index for each keyed item.
Elements that are removed have their keys removed. Elements that are
inserted have no key, and can only be accessed by index. Elements that
are replaced are treated like a removal followed by an insertion.
Elements that are moved have their index-base values adjusted. It is
not clear that splice is really useful for keyed arrays.
=head1 REFERENCES
RFC 21: Subroutines: Replace C<wantarray> with a generic C<want> function
RFC 37: Positional Return Lists Considered Harmful
RFC 56: Optional 2nd argument to C<pop()> and C<shift()>
RFC 84: Replace => (stringifying comma) with => (pair constructor)
RFC 127: Sane resolution to large function returns
RFC 179: More functions from set theory to manipulate arrays
RFC 188: Private keys and methods
RFC 241: Pseudo-hashes must die!
RFC 259: Builtins: Make use of hashref context for garrulous builtins