On Wed, 29 Jan 2003 18:04, Michael G Schwern wrote: > On Tue, Jan 28, 2003 at 12:11:18PM +1300, [EMAIL PROTECTED] wrote: > > This may sound like a silly idea but ... > > > > Has anyone considered removing with the syntactic distinction between > > numeric and string indexing -- that is, between array and hash lookup? > > PHP works this way. > http://www.php.net/manual/en/language.types.array.php > > So that makes a nice case study to investigate.
Stick this in yer pipe n smoke it. I haven't actually written the module yet, just the man page. But I think it might just be crazy enough to implement :-). The restriction that a key is a scalar and a value is a blessed object could be removed, but it was a useful pre-condition for my use. The DWIMy function that tells the difference between an int and a string is at the end. =head1 NAME Container::Object - set/array/hash/bag/whatever of objects =head1 SYNOPSIS use Container::Object; $coll = Container::Object->new(); push @$coll, foo => (bless { }, "bar"); print ref $coll{foo}; # prints "bar" print ref $coll[0]; # prints "bar" print ref( ($coll->members)[0] ); # prints "bar" =head1 DESCRIPTION This modules implements a generic container of objects, that is, an unordered, ordered, or keyed collection of objects, with or without duplication of items. This means that depending on how you treat your container, it automagically behaves like a Set, Bag, Array, Hash or some perverse combination of a subset of the above. =head2 EH? In the beginning, there was the array. Then Larry empowered the Hash with Perl. Some other geezer followed up with C<Tie::ixHash>, hashes that also retain ordering. Jean-Louis Leroy wrote C<Set::Object>, which was a bit like an array but without the ordering, and no duplicate elements. And Java as well as PHP have a poorly considered bastard half and half mix of the two. This container class tries to implement as many of the above classes as possible, with DWIM and no more than O(log n) execution time being the overriding principle. =head2 Keys vs Values Keys to objects in Container::Objects MUST be unblessed, non-overloaded, scalar values. You can supply a tied scalar wherever a key is accepted, but the value is read once. Values MUST be blessed objects. They can be scalars, hashes, references, whatever. It doesn't really matter. =head2 Indexes vs Keys A very DWIMy function is used to tell the difference between a string that is intended for use as a hash key, and a whole number that is intended for use as an index lookup. This is required to get correct behaviour from looking up the element at index 0, versus the item with hash key "0". =head2 Duplicate Items It is possible to have two objects with the same hash key, or even indeed an object may appear twice in the container with a single hash key. However, if objects are to be fetched by (hash) key, and duplicate objects are found, the I<last> object inserted with that key is returned (that is, the one with the highest numerical index). This emulates the behaviour of a hash. To avoid unnecessary, stale or duplicate entries in the container lying around, you can use the C<replace> method instead of C<insert> to put your objects in. =head2 Key Fabrication If objects are inserted without hash keys, they are inserted with an empty hash key (from a user's perspective; internally a hash is computed from the object's address in memory for objects with undefined keys so the container is still well balanced). If objects are inserted with hash keys, they are added to the logical end of the container. =head2 Algorithms and execution time Simple N-bucket hashing with complete re-indexing is used for simplicity of the code and speed for the general target case. Each bucket contains a series of hash table entries, which is an array of pointers to C<hek> structures and C<SV>s. Each entry also contains a reference to the next entry in the numerical index, and copy of its hash key to fill the bucket entry to four CPU words. The bucket index is a flat array of pointers, plus a flat array of C<(index # => pointer)> pairs to various points in the index array chain; so that inserts can be performed in near constant time (the index is merely updated). =head1 CLASS METHODS =head2 new( [ [ I<key> =E<gt> ] I<value>] ... ) Return a new C<Container::Object> containing the elements passed in I<list>. The elements must be objects. Keys are optional. =head1 INSTANCE METHODS =head2 insert( [ [ I<key> =E<gt> ] I<value>] ... ) Add objects to the C<Container::Object>. Adding the same object several times is not an error, but any C<Container::Object> will contain at most one occurance of the same hash key/object pair, and the numeric index always remains continuous, no holes, starting at zero. Returns the number of elements that were actually added. When adding elements to the collection's numerical index, inserts into the list at the decided (or given) position. =head2 replace( [ [ I<key> =E<gt> ] I<value>] ... ) Replaces objects in the C<Container::Object>. Virtually identical to C<insert()>, but likes to replace elements rather than duplicate them. It is not an error if the object, key or key/value pair does or does not exist. Will only replace entries in the numerical list if the passed key is itself numeric. You probably want to use this function when treating the collection as a hash, otherwise you will end up with multiple values sharing the same key. =head2 includes( [ I<keys>, ] [ I<values> ]) Return C<true> if all the objects in I<value> are members of the C<Container::Object>. The argument list may be empty, in which case C<true> is returned. If string keys are passed, the function behaves more like Perl's C<exists> function; if numeric keys are passed, it checks that the index passed is in the valid range of the container. If C<key =E<gt> value> I<pairs> are passed, then the objects must exist at the given location. =head2 members( [I<keys>] ) Return the objects contained in the C<Container::Object> as an array, in the order that they were inserted - or, if I<list> is non-null, it is considered to be equivalent to a hash slice function (add C<undef> to the end of the list to force this behaviour - it always returns no element). Lookup failures return C<undef> in the returned list. When slicing, if an element exists twice with the same hash key, the one with the higher numerical index is returned. =head2 member( [I<key>] ) If I<key> is given, returns the member at that key, or C<undef> if it does not exist; much like the C<members()> function. However, on subsequent calls to the C<member> function, the next key in the numerical index is returned. Returns an EOT mark (I mean, C<undef>) after all the items have been returned. This means if you ask for the I<second> member of the container with C<$container-E<gt>member(1)>, the next call to C<$container-E<gt>member()> will return the I<third> member of the container - ie, the same as C<$container-E<gt>members(2)>. The I<last> call to C<$container-E<gt>member()> that returns a value will return the I<first> member of the container. After that, you'll get a single C<undef>, and then it will start at the I<first> element. In list context, this function returns all the items in the container with the passed key. =head2 size Return the number of elements in the C<Container::Object>. =head2 remove( [ I<keys>, ] [I<values>] ) Remove objects from a C<Set::Object>. Removing the same object more than once, or removing an object absent from the C<Set::Object> is not an error. Returns the number of elements that were actually removed. =head2 utsl BLOCK Use the Schwarz, Luke. This function uses BLOCK to perform a sort, setting C<$a> and C<$b> to the keys of the items in the hash and just updating index values to perform the sort. =head2 sort BLOCK This function uses BLOCK to perform an in-place sort, setting C<$a> and C<$b> to the VALUES of the items in the hash (or, rather, blessed references to them), and preserving hash keys. =head2 clear Empty this C<Container::Object>. =head2 as_string Return a textual Smalltalk-ish representation of the C<Container::Object>. Also available as overloaded operator "". =head2 intersection( [I<list>] ) Return a new C<Container::Object> containing the intersection of the C<Container::Object>s passed as arguments. Also available as overloaded operator *. The following container will actually be empty, as distinct C<key =E<gt> value> pairs are considered unique for set intersection purposes. $o = new Object; $newset = Container::Object->new(hi => $o) * Container::Object->new($o); =head2 union( [I<list>] ) Return a new C<Container::Object> containing the union of the C<Container::Object>s passed as arguments. Also available as overloaded operator +. The following container will end up with two entries of C<$o>: $o = new Object; $newset = Container::Object->new($o, $o) + Container::Object->new($o); =head2 subset( I<set> ) Return C<true> if this C<Container::Object> is a subset of I<set>. Also available as operator <=. For subset comparison, container indexes are ignored. =head2 proper_subset( I<set> ) Return C<true> if this C<Container::Object> is a proper subset of I<set> Also available as operator <. For subset comparison, container indexes are ignored. =head2 superset( I<set> ) Return C<true> if this C<Set::Object> is a superset of I<set>. Also available as operator >=. For superset comparison, container indexes are ignored. =head2 proper_superset( I<set> ) Return C<true> if this C<Container::Object> is a proper superset of I<set> Also available as operator >. For superset comparison, container indexes are ignored. # This function is used to differentiate between an integer and a # string for use by the hash container types sub ish_int { my $scalar = shift; my $i; eval { $i = _ish_int($scalar) }; if ($@) { if ($@ =~ /overload/i) { if (my $sub = UNIVERSAL::can($scalar, "(0+")) { return ish_int(&$sub($scalar)); } else { return undef; } } elsif ($@ =~ /tie/i) { my $x = $scalar; return ish_int($x); } } else { return $i; } } int _ish_int(sv) SV *sv PROTOTYPE: $ CODE: double dutch; int innit; int lp; // world famous in NZ SV * MH; // This function returns the integer value of a passed scalar, as // long as the scalar can reasonably considered to already be a // representation of an integer. This means if you want strings to // be interpreted as integers, you're going to have to add 0 to // them. if (SvMAGICAL(sv)) { // probably a tied scalar //mg_get(sv); Perl_croak(aTHX_ "Tied variables not supported"); } if (SvAMAGIC(sv)) { // an overloaded variable. need to actually call a function to // get its value. Perl_croak(aTHX_ "Overloaded variables not supported"); } if ( SvIOK(sv) ) { // IOK - the scalar is a true integer. RETVAL = SvIV(sv); } else if (SvNOK(sv)) { // NOK - the scalar is a double if (SvPOK(sv)) { // POK - the scalar is also a string. // we have to be careful; a scalar "2am" or, even worse, "2e6" // may satisfy this condition if it has been evaluated in // numeric context. Remember, we are testing that the value // could already be considered an _integer_, and AFAIC 2e6 and // 2.0 are floats, end of story. // So, we stringify the numeric part of the passed SV, turn off // the NOK bit on the scalar, so as to perform a string // comparison against the passed in value. If it is not the // same, then we almost certainly weren't given an integer. MH = Perl_newSVnv(SvNV(sv)); Perl_sv_2pv(MH, &lp); SvNOK_off(MH); if (sv_cmp(MH, sv) != 0) { XSRETURN_UNDEF; } } dutch = SvNV(sv); innit = (int)dutch; if (dutch - innit < (0.000000001)) { RETVAL = innit; } else { XSRETURN_UNDEF; } } else { XSRETURN_UNDEF; } OUTPUT: RETVAL