On Tue, 3 Aug 2010 23:08:49 -0400 Kris Maglione <maglion...@gmail.com> wrote:
> On Tue, Aug 03, 2010 at 11:03:50PM -0400, Kris Maglione wrote: > >There isn't much science to hash functions, > > Let me rephrase that, given the nature of this list. There's not > much science to simple string hash functions. More correctly, if all you're using a hash function for is assigning strings to buckets in a small hash table, you can ignore most of the mathematics involved. If you have something fancier in mind, such as the Bentley-McIlroy repeated substring detection algorithm, you may need to consider the mathematical behaviour of your hash function. (The Bentley-McIlroy algorithm computes a hash function over each contiguous N-character substring of a string (which I will call a document here), and saves every N-th substring (if N=10, it saves the hashes of d[0:9], d[10:19], ...) in a hash table. If you encounter an N-char substring which is already in your hash table, you have a repeated substring; extend the match both backward and forward to find a repeated substring of maximal length. This process is guaranteed to find all repeated substrings of length 2N-1 or greater. Computing the hash function over each N-character substring is tolerably efficient because they use a polynomial hash function (almost, but not exactly like Mr. Todd's example) in which one can quickly compute h(d[m:n]) from d[m], d[n], and h(d[m-1:n-1]).) Robert Ransom
signature.asc
Description: PGP signature