wanted: hash stress tests

Bob Rogers Sat, 21 May 2005 12:45:18 -0700

   From: Leopold Toetsch <[EMAIL PROTECTED]>
   Date: Fri, 20 May 2005 16:07:10 +0200

   I'm currently rewriting the hash implementation in src/hash.c. The new 
   hash structure has just one piece of malloced memory with bucket 
   pointers and buckets in one piece.

   But before comitting I'd like to have some stress and benchmarks tests 
   that first show that my implementation is correct and of course that 
   it's faster too ;-)

Below please find an additional test case for t/pmc/hash.t that defines
>50K keys, while checking that earlier entries are still present.  This
takes about 0.8 sec on my 1.8GHz AMD box, tunable by increasing I22.  Is
this the sort of thing you had in mind?

   I'll ci what I have so far as src/new_hash.c first, when it's usable.

   Thanks,
   leo

One question:  Why is there a space to store computed hash codes in
struct parrot_string_t but not in HashBucket?  Part of the reason I am
asking is because I have started thinking about how to implement Common
Lisp hashes, for which (a) keys can be arbitrary objects, (b) object
equivalence is not defined in terms of stringification, and (c) hash
computation is not necessarily cheap.  The standard technique is
therefore to store each key's computed hash in the bucket, which seems
like it would be a win for Parrot as well.

   From: Leopold Toetsch <[EMAIL PROTECTED]>
   Date: Fri, 20 May 2005 19:34:27 +0200

   Uri Guttman wrote:

   > here is an odd thought to add to that. since your hash is a single hunk
   > of ram, you could use offsets inside it instead of pointers. that means
   > it could be both shareable (given locks) and even writable to disk. 

   I don't see the difference WRT shareable, 

It sounds like Uri meant that it could be mapped at different addresses
for different processes . . . but maybe not, because then the key and
value pointers would be useless.

   . . . but yes, the memory hunk could be written at once for freezing,
   which may save some cycles. More importantly it doesn't need
   relocation during hash_expand(), if offsets are calculated from start
   of mem.

   With some benchmarks we can see, which is better, but using offsets 
   seems to be a good idea.

This may be too obvious to be worth pointing out, but if hash internals
are made persistent, then the hash code computation needs to be
invariant from one interpreter instantiation to another.  For this
reason, the "seed" member of struct _hash is probably counterproductive.
Not to mention the fact that initializing it randomly would make it
harder to test hash iteration, since the key enumeration order would
then be nondeterministic.

   FWIW, Common Lisp defines an SXHASH function [1] that must exhibit
just this invariance characteristic, and must also compute a definite
answer for circular structures.  Time to revive the "hash" opcode?

                                        -- Bob Rogers
                                           http://rgrjr.dyndns.org/

[1]  http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm

------------------------------------------------------------------------
Index: t/pmc/hash.t
===================================================================
--- t/pmc/hash.t        (revision 8136)
+++ t/pmc/hash.t        (working copy)
@@ -19,7 +19,7 @@

 =cut

-use Parrot::Test tests => 35;
+use Parrot::Test tests => 36;
 use Test::More;

 output_is(<<CODE, <<OUTPUT, "Initial Hash tests");
@@ -273,6 +273,95 @@
 0
 OUTPUT

+## stuff them in, and check periodically that we can pull selected ones out.
+pir_output_is(<<'CODE', <<OUTPUT, "stress test: lots of keys");
+.sub set_multiple_keys prototyped
+       .param pmc hash
+        .param int key_index
+        .param int step
+       .param int count
+
+again:
+       if count <= 0 goto ret
+       S0 = key_index
+       S1 = concat "key", S0
+       S2 = concat "value", S0
+       hash[S1] = S2
+       key_index = key_index + step
+       count = count - 1
+       goto again
+ret:
+.end
+
+.sub print_multiple_keys prototyped
+       .param pmc hash
+        .param int key_index
+        .param int step
+       .param int count
+
+again:
+       if count <= 0 goto ret
+       S0 = key_index
+       S1 = concat "key", S0
+       print S1
+       print " => "
+       I2 = exists hash[S1]
+       if I2 goto print_value
+       print "(undef)"
+       goto print_end
+print_value: 
+       S2 = hash[S1]
+       print S2
+       print "\n"
+       key_index = key_index + step
+       count = count - 1
+       goto again
+ret:
+.end
+
+.sub _main @MAIN
+       new     P30, .Hash
+       print "round 1\n"
+       I29 = 1
+       I30 = 1000
+       I31 = 1000
+       set_multiple_keys(P30, I29, I30, I31)
+       I20 = 3
+       print_multiple_keys(P30, I29, I30, I20)
+       print "round 2\n"
+       I21 = 100000
+       set_multiple_keys(P30, I21, I30, I31)
+       print_multiple_keys(P30, I29, I30, I20)
+       print_multiple_keys(P30, I21, I30, I20)
+       print "round 3\n"
+       I22 = 50000
+       set_multiple_keys(P30, I22, I29, I22)
+       print_multiple_keys(P30, I29, I30, I20)
+       print_multiple_keys(P30, I21, I30, I20)
+       print "done.\n"
+.end
+CODE
+round 1
+key1 => value1
+key1001 => value1001
+key2001 => value2001
+round 2
+key1 => value1
+key1001 => value1001
+key2001 => value2001
+key100000 => value100000
+key101000 => value101000
+key102000 => value102000
+round 3
+key1 => value1
+key1001 => value1001
+key2001 => value2001
+key50000 => value50000
+key51000 => value51000
+key52000 => value52000
+done.
+OUTPUT
+
 # Check all values after setting all of them
 output_is(<<CODE, <<OUTPUT, "stress test: loop(set), loop(check)");
        new     P0, .Hash

wanted: hash stress tests

Reply via email to