Steve Meyers wrote:

> > > At a previous job, we tested a 32-bit hash function by running it
> > > against hundreds of thousands of unique URL's stored in our
> > > database.  We found one collision.  A 64-bit hash is billions of
> > > times better (4 billion, to be exact).
> >
> > Good to know.  I wonder how many collisions I'd find if I ran it over
> > every URL listed in the directory www.yahoo.com.
> >
> > Which 64 bit hash function did you use?  Invent your own, or something
> > "off the shelf"?
> >
>
> We found a public domain one on the net see 
>http://www.burtleburtle.net/bob/hash/evahash.html for some sample code.  It's only a 
>32-bit hash though.  However, that same page appears to have instructions for a 
>64-bit hash function as well, but I haven't tried it at all.  I'd be curious to know 
>how many collisions you find hashing all the URL's in yahoo's database :)  I don't 
>know how long that would take, but if you do it I'd like to hear the results.
>
> Since the hash function takes a key and an initial value, you could try running it 
>with two different initial values and/or keys.  This would give you effectively a 
>128-bit hash, which you could store across two fields in MySQL.  I'm guessing that 
>the 64-bit hash will probably be good enough though.

I am not understanding why having a hash and the full url in the database would not 
take care of the collisions.  Even if you had 10 collisions for a 16 bit hash (say), 
if your query was:
SELECT ... WHERE hash=thehashvalue AND url='theurl' you would get very fast lookups on 
the hash and the url comparison would not add much to the query at that point.  You 
could even do a partial index on the url, e.g.  "KEY( hash, url(200))".

b.



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to