Steve Meyers wrote: > > > At a previous job, we tested a 32-bit hash function by running it > > > against hundreds of thousands of unique URL's stored in our > > > database. We found one collision. A 64-bit hash is billions of > > > times better (4 billion, to be exact). > > > > Good to know. I wonder how many collisions I'd find if I ran it over > > every URL listed in the directory www.yahoo.com. > > > > Which 64 bit hash function did you use? Invent your own, or something > > "off the shelf"? > > > > We found a public domain one on the net see >http://www.burtleburtle.net/bob/hash/evahash.html for some sample code. It's only a >32-bit hash though. However, that same page appears to have instructions for a >64-bit hash function as well, but I haven't tried it at all. I'd be curious to know >how many collisions you find hashing all the URL's in yahoo's database :) I don't >know how long that would take, but if you do it I'd like to hear the results. > > Since the hash function takes a key and an initial value, you could try running it >with two different initial values and/or keys. This would give you effectively a >128-bit hash, which you could store across two fields in MySQL. I'm guessing that >the 64-bit hash will probably be good enough though.
I am not understanding why having a hash and the full url in the database would not take care of the collisions. Even if you had 10 collisions for a 16 bit hash (say), if your query was: SELECT ... WHERE hash=thehashvalue AND url='theurl' you would get very fast lookups on the hash and the url comparison would not add much to the query at that point. You could even do a partial index on the url, e.g. "KEY( hash, url(200))". b. --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php