> On Sat, Aug 22, 2009 at 4:04 PM, Nicholas Clark <n...@ccl4.org> wrote: > > > On Sat, Aug 22, 2009 at 04:01:13PM -0500, Forrest Sheng Bao wrote: > > > Does anyone know the time complexity of searching elements in hash tables > > in > > > Perl? Suppose the number of elements is n. > > > > > > I have no idea on how Perl automatically builds the hash function and the > > > table. If you can tell me how it works, that is also helpful. I can > > analyze > > > the algorithm myself.
On Sat, Aug 22, 2009 at 04:07:39PM -0500, Forrest Sheng Bao wrote: > Oh, I mean both Perl 5 and Perl 6. I couldn't find proper list to ask this > question. So I asked in this list. There isn't really a single list to ask that question of. Perl 6 has one spec, and more than one implementation, and I don't think that the spec lays down the performance behaviour of hash tables. (But I don't know this well, and this is the right list to correct me if I'm wrong) Perl 5 has an implementation, and no spec. Perl 5's hash tables work only with string keys. (Obtect sequences with an associated length, and a flag to specify whether the octet sequence is assumed to be ISO-8859-1 or UTF-8, with all keys presented to the API as UTF-8 normalised internally to ISO-8859-1 if possible. Strictly the data structures used have a way to store a pointer to a perl scalar, but I don't think that any part of the internals uses that) Perl 5's hash tables consist of an array of linked lists. The list elements consist of a pointer to the next element, pointer to the value stored, and pointer to a structure, usually shared, containing the key (length, octets, hash value, flags). A 32 bit hash value is computed from the string key: http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.h#l120 The bottom m bits of the hash value are used to index into the array http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l789 and then the linked list is walked looking for the item. There is no direct relationship between n, the number of elements, and m, the number of bits used. The initial array size is 8, and the array is doubled in size if the number of keys becomes too large, or any linked list becomes to large (hsplit() is the splitting function): http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l815 [Whilst I have refactored the implementation, I didn't write the original, so I have to work out the algorithm from it, and may be wrong] My understanding is that this makes Perl 5 hash tables amortised O(1). Hopefully someone else will answer the Perl 6 side properly. Nicholas Clark