Re: time complexity of searching elements in hash tables in Perl

Nicholas Clark Sat, 22 Aug 2009 14:28:15 -0700

> On Sat, Aug 22, 2009 at 4:04 PM, Nicholas Clark <n...@ccl4.org> wrote:
> 
> > On Sat, Aug 22, 2009 at 04:01:13PM -0500, Forrest Sheng Bao wrote:
> > > Does anyone know the time complexity of searching elements in hash tables
> > in
> > > Perl? Suppose the number of elements is n.
> > >
> > > I have no idea on how Perl automatically builds the hash function and the
> > > table. If you can tell me how it works, that is also helpful. I can
> > analyze
> > > the algorithm myself.

On Sat, Aug 22, 2009 at 04:07:39PM -0500, Forrest Sheng Bao wrote:
> Oh, I mean both Perl 5 and Perl 6. I couldn't find proper list to ask this
> question. So I asked in this list.

There isn't really a single list to ask that question of.

Perl 6 has one spec, and more than one implementation, and I don't think that
the spec lays down the performance behaviour of hash tables. (But I don't know
this well, and this is the right list to correct me if I'm wrong)

Perl 5 has an implementation, and no spec.

Perl 5's hash tables work only with string keys. (Obtect sequences with an
associated length, and a flag to specify whether the octet sequence is
assumed to be ISO-8859-1 or UTF-8, with all keys presented to the API as
UTF-8 normalised internally to ISO-8859-1 if possible. Strictly the data
structures used have a way to store a pointer to a perl scalar, but I don't
think that any part of the internals uses that)

Perl 5's hash tables consist of an array of linked lists. The list elements
consist of a pointer to the next element, pointer to the value stored, and
pointer to a structure, usually shared, containing the key (length, octets,
hash value, flags).

A 32 bit hash value is computed from the string key:

http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.h#l120

The bottom m bits of the hash value are used to index into the array

http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l789

and then the linked list is walked looking for the item.

There is no direct relationship between n, the number of elements, and m, the
number of bits used. The initial array size is 8, and the array is doubled in
size if the number of keys becomes too large, or any linked list becomes to
large (hsplit() is the splitting function):

http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l815

[Whilst I have refactored the implementation, I didn't write the original,
so I have to work out the algorithm from it, and may be wrong]

My understanding is that this makes Perl 5 hash tables amortised O(1).

Hopefully someone else will answer the Perl 6 side properly.

Nicholas Clark

Re: time complexity of searching elements in hash tables in Perl

Reply via email to