Re: wanted: hash stress tests

Bob Rogers Sun, 22 May 2005 19:46:47 -0700

   From: Leopold Toetsch <[EMAIL PROTECTED]>
   Date: Sun, 22 May 2005 16:58:00 +0200

   Bob Rogers wrote:

   > Below please find an additional test case for t/pmc/hash.t that defines
   > 
   >>50K keys, while checking that earlier entries are still present.  This
   > 
   > takes about 0.8 sec on my 1.8GHz AMD box, tunable by increasing I22.  Is
   > this the sort of thing you had in mind?

   Yeah. Thanks. It's OTOH failing here, I've too look at it, then it'll go 
   in. Another one which additionally deletes keys would be great.

Good idea; I'll work on that.  (I assume the failure was because of the
bug I introduced into the emailed version?)

   > One question:  Why is there a space to store computed hash codes in
   > struct parrot_string_t but not in HashBucket?

   Having it in the bucket is a space penalty for other objects that can 
   provide an hash value easily like e.g. Integer PMCs. More complicated 
   objects like strings can therefore cache their hash val inside their 
   structure.

Hmm.  Seems to me that this space penalty is proportional to the number
of keys for keeping it in the HashBucket, and proportional to the number
of objects for keeping it in the objects.  Since the number of string
keys (e.g.) must be smaller than the total number of strings, and may be
much smaller, it seems like it would be better to keep them in the
HashBucket, even though that's somewhat redundant for such things as
Integer PMCs.  True?

   > Not to mention the fact that initializing it randomly would make it
   > harder to test hash iteration, since the key enumeration order would
   > then be nondeterministic.

   Well, the reason *is* exactly to make it nondeterministic. The perl5 
   archives should have a lot of discussion WRT that, keyword DOS attacks.

If DOS attacks on hashes are a consideration, then using the seed purely
as an initializer won't be enough.  In particular, I note that
key_hash_cstring produces the same hash code for the strings "cB" and
"bc" regardless of the seed value, so they would always collide.  Doing

        h += hash->seed + *p++;

in the iteration would make it harder to attack, IMHO.  (But IANASE
("security expert").  ;-)

   On the other hand, I would argue that it's way too early to apply
this kind of low-level security hardening; it's a lot of work, and it's
too easy for someone to accidentally undo it in a rapidly-evolving
system by helpfully "simplifying" the code.

   In any case, I take your point that the seed will eventually need to
be initialized randomly -- as long as it can be set deterministically
for testing/debugging purposes, I'm happy.

   >    FWIW, Common Lisp defines an SXHASH function [1] that must exhibit
   > just this invariance characteristic, and must also compute a definite
   > answer for circular structures.  Time to revive the "hash" opcode?

   It depends on how we define the equality of objects or aggregates, i.e 
   if or how deeply that might recurse . . .

True.  That is all set in stone [1] for Common Lisp, but other languages
may have other requirements.  I imagine all languages will need to be
able to compute hashes on keys from other languages, so a universal hash
operator that treats its argument as opaque would be useful.  (Since the
GC doesn't move objects, the object address may be sufficient.)  But
thinking about which language operator and/or object should decide
whether to recur into a given object for purposes of equality testing
gives me a headache . . .

                                        -- Bob

[1]   http://www.lispworks.com/documentation/HyperSpec/Body/f_equal.htm
      http://www.lispworks.com/documentation/HyperSpec/Body/f_equalp.htm

Re: wanted: hash stress tests

Reply via email to