Took off stoopid lartc. Please dont cc that list again since i keep forgetting to remove it in my responses.
I only did the 8 bit test - I may go back and do the 16 bit but now i am almost sure that the majority of the results will produce similar results. So i dont know if i should bother. view the file via gnumeric ("gnumeric filename"); Initially i was trying to produce a gnuplot output, but just changed my mind after the first results, so the comments/headings on 8b-256bkt are a little misleading; the data results are correct. The test -------- * All possible masks length 1-8 * sufficient range of data to cover 8 bits { upto 256 } * bucket sizes of 256, 128, 64, 32, 16, 8 In the .csv you will find a set of data in a table and then a graph visualizing the table. The table: --------- There are as many rows as there are buckets. Three columns are represented in the form of: ideal-value old-hash new-hash as an example: 2 16 8 implies ideal value per bucket is 2, old hash value entered 16 (out of 256) values in that bucket; the new hash alg entered 8. The graphs: ---------- The blue straight line represents the ideal value. This helps visualize the deviation of the two hashes. The yellow line represents the new hash The pink/purple line represents the old hash. The X axis represents the hash buckets and the Y axis the number in the buckets out of 256 possible values. Admittedly, it would have been easier to see the plot in a bar graph, but it seemed to require some craftsmanship that i didnt have patience for. Visualizing ----------- Pick one of the masks that dont behave well - example 0xfc. On each of the sheets "zone" on 0xfc. Then click progressively as the hash buckets change and you will make the same observations as i did. Observation: ----------- For a length of X for the mask, then the ideal value of buckets needed is 2^X. So in the case of 0xfc (length of 7), 128 buckets suffice. 1)If more than 2^X were used, then both hashes wasted them. 2)If less than 2^X were used, then the new hash used them effectively, but the old one wasted them still. The old hash is never able to utilize all the buckets except for the following cases: a) mask length of 8 was used - this was the case in all hash bucket size b) mask length of 4 in the case of 256 buckets. #a and #b amount for about < 16% of all possible cases. This is what is tagged as a bug. 3) The new hash always is several factors better in terms of worst case lookups. In general it was a factor of 4 better. Conclusion ---------- Other than fixing a bug, then new hash is at least equal to the old hash in about 16% of the cases and better in the rest(>80% of the time). This is true in the case of both memory abuse and worst case lookups. I believe there is room for improvement but it has absolutely nothing to do with the old hash (and for that matter not the new one either, even though the new one is a lot better). cheers, jamal
j1-8b-allbuckets.csv.gz
Description: GNU Zip compressed data