On Tue, 3 Aug 2010 23:08:49 -0400
Kris Maglione <maglion...@gmail.com> wrote:

> On Tue, Aug 03, 2010 at 11:03:50PM -0400, Kris Maglione wrote:
> >There isn't much science to hash functions,
> 
> Let me rephrase that, given the nature of this list. There's not 
> much science to simple string hash functions.

More correctly, if all you're using a hash function for is assigning
strings to buckets in a small hash table, you can ignore most of the
mathematics involved.

If you have something fancier in mind, such as the Bentley-McIlroy
repeated substring detection algorithm, you may need to consider the
mathematical behaviour of your hash function.  (The Bentley-McIlroy
algorithm computes a hash function over each contiguous N-character
substring of a string (which I will call a document here), and saves
every N-th substring (if N=10, it saves the hashes of d[0:9],
d[10:19], ...) in a hash table.  If you encounter an N-char substring
which is already in your hash table, you have a repeated substring;
extend the match both backward and forward to find a repeated substring
of maximal length.  This process is guaranteed to find all repeated
substrings of length 2N-1 or greater.  Computing the hash function over
each N-character substring is tolerably efficient because they use a
polynomial hash function (almost, but not exactly like Mr. Todd's
example) in which one can quickly compute h(d[m:n]) from d[m], d[n],
and h(d[m-1:n-1]).)

Robert Ransom

Attachment: signature.asc
Description: PGP signature

Reply via email to