Hi Robert,
On Fri, Mar 19, 2010 at 2:13 PM, Robert Dewar <de...@adacore.com> wrote: > Jae Hyuk Kwak wrote: > >> For the issue of Modulus operation, it does not need to be divide or >> hardware modulus operation. >> Let me give you an example here: >> 13 % 2 = 1 >> 13 & (2-1) = 1 > > Yes, of course, we all know that, and of course the compiler does > these simple optimizations. However, for computing hashes you > typically want modulus with OTHER than a power of 2 to involve > all bits. The point that I made is we don't need the "OTHER". For example, when we have 200 cases in a switch, we don't need to use 200 as OTHER, but we can just use 256. Because there is no reason to stick on the number of cases for the case of hash table. >> >> In English, we can use AND operator as substitutes of modulus operator. >> In the way, the size of hash table should be restricted to certain >> values like 2, 4, 8, 16, and so on. >> But it is not a big problem because by the nature of Hash Table, we >> don't make the table full. >> For example, we have 257 cases on a switch statement, then we need a >> hash table whose bucket size is 512. >> It will have 50% filled hash table, but 50% filled hash table is usual. > > Again, yes, of course, but typically power of two size for a hash table > followed by an and that takes only the low order bits is likely to be > a very poor hash choice. I think we have different assumption on the hash function. The hash function I am thinking of is not just taking lower bits. The AND operator part takes place after hash function is done. Those hash functions can be one of functions on this web page: http://www.concentric.net/~Ttwang/tech/inthash.htm Let me copy and paste the function to prevent misunderstanding. public int hash32shift(int key) { key = ~key + (key << 15); // key = (key << 15) - key - 1; key = key ^ (key >>> 12); key = key + (key << 2); key = key ^ (key >>> 4); key = key * 2057; // key = (key + (key << 3)) + (key << 11); key = key ^ (key >>> 16); return key; } int value = given_from_user_at_runtime(); int hashKey = hash32shift( value ); int actualHashKey = hashKey & ( 512 - 1 ); HashBucket * bucket = hashTable[ actualHashKey ]; if( bucket->key == value ) (*(bucket->function)(); >> In order to prevent the false positives, we need to store the key >> value as you correctly pointed. >> So the final process will be first to the hash key value calculation >> and then go to the hash table, and compare whether the value is >> correct or not. > > Again, you can assume we all know the elementary algorithms for > managing hash tables. > > One thing to realize is that hashing only has good average performance. > So your O(N) is talking about average case performance, whereas the > O(NlogN) for a tree search is worst case. That's comparing apples and > oranges. It is worrisome to have the possibility of really bad > performance for a particular bad luck case. As I have already corrected, the performance of HashTable way is O(1) not O(N). You may think it is not proper comparison, but I think comparing "tree search" with hash table. Let me cite a sentence from Wiki: http://en.wikipedia.org/wiki/Hash_table "In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure." Thanks for your response anyway. Jay