Re: Spare brackets :-)

Sam Vilain Wed, 29 Jan 2003 02:31:11 -0800

On Wed, 29 Jan 2003 18:04, Michael G Schwern wrote:
> On Tue, Jan 28, 2003 at 12:11:18PM +1300, [EMAIL PROTECTED] wrote:
> > This may sound like a silly idea but ...
> >
> > Has anyone considered removing with the syntactic distinction between
> > numeric and string indexing -- that is, between array and hash lookup?
>
> PHP works this way.
> http://www.php.net/manual/en/language.types.array.php
>
> So that makes a nice case study to investigate.


Stick this in yer pipe n smoke it.  I haven't actually written the module 
yet, just the man page.  But I think it might just be crazy enough to 
implement :-).

The restriction that a key is a scalar and a value is a blessed object 
could be removed, but it was a useful pre-condition for my use.

The DWIMy function that tells the difference between an int and a string is 
at the end.

=head1 NAME

Container::Object - set/array/hash/bag/whatever of objects

=head1 SYNOPSIS

  use Container::Object;
  $coll = Container::Object->new();

  push @$coll, foo => (bless { }, "bar");
  print ref $coll{foo};             # prints "bar"
  print ref $coll[0];               # prints "bar"
  print ref( ($coll->members)[0] ); # prints "bar"

=head1 DESCRIPTION

This modules implements a generic container of objects, that is, an
unordered, ordered, or keyed collection of objects, with or without
duplication of items.

This means that depending on how you treat your container, it
automagically behaves like a Set, Bag, Array, Hash or some perverse
combination of a subset of the above.

=head2 EH?

In the beginning, there was the array.  Then Larry empowered the Hash
with Perl.  Some other geezer followed up with C<Tie::ixHash>, hashes
that also retain ordering.  Jean-Louis Leroy wrote C<Set::Object>,
which was a bit like an array but without the ordering, and no
duplicate elements.  And Java as well as PHP have a poorly considered
bastard half and half mix of the two.

This container class tries to implement as many of the above classes
as possible, with DWIM and no more than O(log n) execution time being
the overriding principle.

=head2 Keys vs Values

Keys to objects in Container::Objects MUST be unblessed,
non-overloaded, scalar values.  You can supply a tied scalar wherever
a key is accepted, but the value is read once.

Values MUST be blessed objects.  They can be scalars, hashes,
references, whatever.  It doesn't really matter.

=head2 Indexes vs Keys

A very DWIMy function is used to tell the difference between a string
that is intended for use as a hash key, and a whole number that is
intended for use as an index lookup.

This is required to get correct behaviour from looking up the element
at index 0, versus the item with hash key "0".

=head2 Duplicate Items

It is possible to have two objects with the same hash key, or even
indeed an object may appear twice in the container with a single hash
key.

However, if objects are to be fetched by (hash) key, and duplicate
objects are found, the I<last> object inserted with that key is
returned (that is, the one with the highest numerical index).  This
emulates the behaviour of a hash.

To avoid unnecessary, stale or duplicate entries in the container
lying around, you can use the C<replace> method instead of C<insert>
to put your objects in.

=head2 Key Fabrication

If objects are inserted without hash keys, they are inserted with an
empty hash key (from a user's perspective; internally a hash is
computed from the object's address in memory for objects with
undefined keys so the container is still well balanced).

If objects are inserted with hash keys, they are added to the logical
end of the container.

=head2 Algorithms and execution time

Simple N-bucket hashing with complete re-indexing is used for
simplicity of the code and speed for the general target case.

Each bucket contains a series of hash table entries, which is an array
of pointers to C<hek> structures and C<SV>s.  Each entry also contains
a reference to the next entry in the numerical index, and copy of its
hash key to fill the bucket entry to four CPU words.

The bucket index is a flat array of pointers, plus a flat array of
C<(index # => pointer)> pairs to various points in the index array
chain; so that inserts can be performed in near constant time (the
index is merely updated).

=head1 CLASS METHODS

=head2 new( [ [ I<key> =E<gt> ] I<value>] ... )

Return a new C<Container::Object> containing the elements passed in
I<list>.  The elements must be objects.  Keys are optional.

=head1 INSTANCE METHODS

=head2 insert( [ [ I<key> =E<gt> ] I<value>] ... )

Add objects to the C<Container::Object>.

Adding the same object several times is not an error, but any
C<Container::Object> will contain at most one occurance of the same
hash key/object pair, and the numeric index always remains continuous,
no holes, starting at zero.  Returns the number of elements that were
actually added.

When adding elements to the collection's numerical index, inserts into
the list at the decided (or given) position.

=head2 replace( [ [ I<key> =E<gt> ] I<value>] ... )

Replaces objects in the C<Container::Object>.

Virtually identical to C<insert()>, but likes to replace elements
rather than duplicate them.  It is not an error if the object, key or
key/value pair does or does not exist.

Will only replace entries in the numerical list if the passed key is
itself numeric.

You probably want to use this function when treating the collection as
a hash, otherwise you will end up with multiple values sharing the
same key.

=head2 includes( [ I<keys>, ] [ I<values> ])

Return C<true> if all the objects in I<value> are members of the
C<Container::Object>.  The argument list may be empty, in which case
C<true> is returned.

If string keys are passed, the function behaves more like Perl's
C<exists> function; if numeric keys are passed, it checks that the
index passed is in the valid range of the container.  If C<key =E<gt>
value> I<pairs> are passed, then the objects must exist at the given
location.

=head2 members( [I<keys>] )

Return the objects contained in the C<Container::Object> as an array,
in the order that they were inserted - or, if I<list> is non-null, it
is considered to be equivalent to a hash slice function (add C<undef>
to the end of the list to force this behaviour - it always returns no
element).  Lookup failures return C<undef> in the returned list.

When slicing, if an element exists twice with the same hash key, the
one with the higher numerical index is returned.

=head2 member( [I<key>] )

If I<key> is given, returns the member at that key, or C<undef> if it
does not exist; much like the C<members()> function.  However, on
subsequent calls to the C<member> function, the next key in the
numerical index is returned.  Returns an EOT mark (I mean, C<undef>)
after all the items have been returned.

This means if you ask for the I<second> member of the container with
C<$container-E<gt>member(1)>, the next call to
C<$container-E<gt>member()> will return the I<third> member of the
container - ie, the same as C<$container-E<gt>members(2)>.  The
I<last> call to C<$container-E<gt>member()> that returns a value will
return the I<first> member of the container.  After that, you'll get a
single C<undef>, and then it will start at the I<first> element.

In list context, this function returns all the items in the container
with the passed key.

=head2 size

Return the number of elements in the C<Container::Object>.

=head2 remove( [ I<keys>, ] [I<values>] )

Remove objects from a C<Set::Object>.

Removing the same object more than once, or removing an object absent
from the C<Set::Object> is not an error.

Returns the number of elements that were actually removed.

=head2 utsl BLOCK

Use the Schwarz, Luke.

This function uses BLOCK to perform a sort, setting C<$a> and C<$b> to
the keys of the items in the hash and just updating index values to perform 
the sort.

=head2 sort BLOCK

This function uses BLOCK to perform an in-place sort, setting C<$a>
and C<$b> to the VALUES of the items in the hash (or, rather, blessed
references to them), and preserving hash keys.

=head2 clear

Empty this C<Container::Object>.

=head2 as_string

Return a textual Smalltalk-ish representation of the
C<Container::Object>.  Also available as overloaded operator "".

=head2 intersection( [I<list>] )

Return a new C<Container::Object> containing the intersection of the
C<Container::Object>s passed as arguments.  Also available as
overloaded operator *.

The following container will actually be empty, as distinct C<key
=E<gt> value> pairs are considered unique for set intersection
purposes.

  $o = new Object;
  $newset = Container::Object->new(hi => $o) *
            Container::Object->new($o);

=head2 union( [I<list>] )

Return a new C<Container::Object> containing the union of the
C<Container::Object>s passed as arguments.  Also available as
overloaded operator +.

The following container will end up with two entries of C<$o>:

  $o = new Object;
  $newset = Container::Object->new($o, $o) +
            Container::Object->new($o);

=head2 subset( I<set> )

Return C<true> if this C<Container::Object> is a subset of I<set>.
Also available as operator <=.

For subset comparison, container indexes are ignored.

=head2 proper_subset( I<set> )

Return C<true> if this C<Container::Object> is a proper subset of
I<set> Also available as operator <.

For subset comparison, container indexes are ignored.

=head2 superset( I<set> )

Return C<true> if this C<Set::Object> is a superset of I<set>.
Also available as operator >=.

For superset comparison, container indexes are ignored.

=head2 proper_superset( I<set> )

Return C<true> if this C<Container::Object> is a proper superset of
I<set> Also available as operator >.

For superset comparison, container indexes are ignored.




# This function is used to differentiate between an integer and a
# string for use by the hash container types

sub ish_int {
    my $scalar = shift;

    my $i;
    eval { $i = _ish_int($scalar) };

    if ($@) {
        if ($@ =~ /overload/i) {
            if (my $sub = UNIVERSAL::can($scalar, "(0+")) {
                return ish_int(&$sub($scalar));
            } else {
                return undef;
            }
        } elsif ($@ =~ /tie/i) {
            my $x = $scalar;
            return ish_int($x);
        }
    } else {
        return $i;
    }
}


int
_ish_int(sv)
        SV *sv
PROTOTYPE: $
CODE:
  double dutch;
  int innit;
  int lp;  // world famous in NZ
  SV * MH;
  // This function returns the integer value of a passed scalar, as
  // long as the scalar can reasonably considered to already be a
  // representation of an integer.  This means if you want strings to
  // be interpreted as integers, you're going to have to add 0 to
  // them.

  if (SvMAGICAL(sv)) {
    // probably a tied scalar
    //mg_get(sv);
    Perl_croak(aTHX_ "Tied variables not supported");
  }

  if (SvAMAGIC(sv)) {
    // an overloaded variable.  need to actually call a function to
    // get its value.
    Perl_croak(aTHX_ "Overloaded variables not supported");
  }

  if ( SvIOK(sv) ) {
    // IOK - the scalar is a true integer.
    RETVAL = SvIV(sv);
  } else if (SvNOK(sv)) {
    // NOK - the scalar is a double

    if (SvPOK(sv)) {
      // POK - the scalar is also a string.

      // we have to be careful; a scalar "2am" or, even worse, "2e6"
      // may satisfy this condition if it has been evaluated in
      // numeric context.  Remember, we are testing that the value
      // could already be considered an _integer_, and AFAIC 2e6 and
      // 2.0 are floats, end of story.

      // So, we stringify the numeric part of the passed SV, turn off
      // the NOK bit on the scalar, so as to perform a string
      // comparison against the passed in value.  If it is not the
      // same, then we almost certainly weren't given an integer.

      MH = Perl_newSVnv(SvNV(sv));
      Perl_sv_2pv(MH, &lp);
      SvNOK_off(MH);
      if (sv_cmp(MH, sv) != 0) {
        XSRETURN_UNDEF;
      }
    }
    dutch = SvNV(sv);
    innit = (int)dutch;
    if (dutch - innit < (0.000000001)) {
      RETVAL = innit;
    } else {
      XSRETURN_UNDEF;
    }
  } else {
    XSRETURN_UNDEF;
  }
OUTPUT:
  RETVAL

Re: Spare brackets :-)

Reply via email to