From: Leopold Toetsch <[EMAIL PROTECTED]> Date: Fri, 20 May 2005 16:07:10 +0200
I'm currently rewriting the hash implementation in src/hash.c. The new hash structure has just one piece of malloced memory with bucket pointers and buckets in one piece. But before comitting I'd like to have some stress and benchmarks tests that first show that my implementation is correct and of course that it's faster too ;-) Below please find an additional test case for t/pmc/hash.t that defines >50K keys, while checking that earlier entries are still present. This takes about 0.8 sec on my 1.8GHz AMD box, tunable by increasing I22. Is this the sort of thing you had in mind? I'll ci what I have so far as src/new_hash.c first, when it's usable. Thanks, leo One question: Why is there a space to store computed hash codes in struct parrot_string_t but not in HashBucket? Part of the reason I am asking is because I have started thinking about how to implement Common Lisp hashes, for which (a) keys can be arbitrary objects, (b) object equivalence is not defined in terms of stringification, and (c) hash computation is not necessarily cheap. The standard technique is therefore to store each key's computed hash in the bucket, which seems like it would be a win for Parrot as well. From: Leopold Toetsch <[EMAIL PROTECTED]> Date: Fri, 20 May 2005 19:34:27 +0200 Uri Guttman wrote: > here is an odd thought to add to that. since your hash is a single hunk > of ram, you could use offsets inside it instead of pointers. that means > it could be both shareable (given locks) and even writable to disk. I don't see the difference WRT shareable, It sounds like Uri meant that it could be mapped at different addresses for different processes . . . but maybe not, because then the key and value pointers would be useless. . . . but yes, the memory hunk could be written at once for freezing, which may save some cycles. More importantly it doesn't need relocation during hash_expand(), if offsets are calculated from start of mem. With some benchmarks we can see, which is better, but using offsets seems to be a good idea. This may be too obvious to be worth pointing out, but if hash internals are made persistent, then the hash code computation needs to be invariant from one interpreter instantiation to another. For this reason, the "seed" member of struct _hash is probably counterproductive. Not to mention the fact that initializing it randomly would make it harder to test hash iteration, since the key enumeration order would then be nondeterministic. FWIW, Common Lisp defines an SXHASH function [1] that must exhibit just this invariance characteristic, and must also compute a definite answer for circular structures. Time to revive the "hash" opcode? -- Bob Rogers http://rgrjr.dyndns.org/ [1] http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm ------------------------------------------------------------------------ Index: t/pmc/hash.t =================================================================== --- t/pmc/hash.t (revision 8136) +++ t/pmc/hash.t (working copy) @@ -19,7 +19,7 @@ =cut -use Parrot::Test tests => 35; +use Parrot::Test tests => 36; use Test::More; output_is(<<CODE, <<OUTPUT, "Initial Hash tests"); @@ -273,6 +273,95 @@ 0 OUTPUT +## stuff them in, and check periodically that we can pull selected ones out. +pir_output_is(<<'CODE', <<OUTPUT, "stress test: lots of keys"); +.sub set_multiple_keys prototyped + .param pmc hash + .param int key_index + .param int step + .param int count + +again: + if count <= 0 goto ret + S0 = key_index + S1 = concat "key", S0 + S2 = concat "value", S0 + hash[S1] = S2 + key_index = key_index + step + count = count - 1 + goto again +ret: +.end + +.sub print_multiple_keys prototyped + .param pmc hash + .param int key_index + .param int step + .param int count + +again: + if count <= 0 goto ret + S0 = key_index + S1 = concat "key", S0 + print S1 + print " => " + I2 = exists hash[S1] + if I2 goto print_value + print "(undef)" + goto print_end +print_value: + S2 = hash[S1] + print S2 + print "\n" + key_index = key_index + step + count = count - 1 + goto again +ret: +.end + +.sub _main @MAIN + new P30, .Hash + print "round 1\n" + I29 = 1 + I30 = 1000 + I31 = 1000 + set_multiple_keys(P30, I29, I30, I31) + I20 = 3 + print_multiple_keys(P30, I29, I30, I20) + print "round 2\n" + I21 = 100000 + set_multiple_keys(P30, I21, I30, I31) + print_multiple_keys(P30, I29, I30, I20) + print_multiple_keys(P30, I21, I30, I20) + print "round 3\n" + I22 = 50000 + set_multiple_keys(P30, I22, I29, I22) + print_multiple_keys(P30, I29, I30, I20) + print_multiple_keys(P30, I21, I30, I20) + print "done.\n" +.end +CODE +round 1 +key1 => value1 +key1001 => value1001 +key2001 => value2001 +round 2 +key1 => value1 +key1001 => value1001 +key2001 => value2001 +key100000 => value100000 +key101000 => value101000 +key102000 => value102000 +round 3 +key1 => value1 +key1001 => value1001 +key2001 => value2001 +key50000 => value50000 +key51000 => value51000 +key52000 => value52000 +done. +OUTPUT + # Check all values after setting all of them output_is(<<CODE, <<OUTPUT, "stress test: loop(set), loop(check)"); new P0, .Hash