Results WAS(Re: [PATCH] TC: bug fixes to the "sample" clause

jamal Sun, 19 Mar 2006 08:33:31 -0800

Took off stoopid lartc. Please dont cc that list again since i keep
forgetting to remove it in my responses.


I only did the 8 bit test - I may go back and do the 16 bit
but now i am almost sure that the majority of the results will produce
similar results. So i dont know if i should bother. 

view the file via gnumeric ("gnumeric filename");
Initially i was trying to produce a gnuplot output, but just 
changed my mind after the first results, so the comments/headings
on 8b-256bkt are a little misleading; the data results are correct.

The test
--------

* All possible masks length 1-8
* sufficient range of data to cover 8 bits { upto 256 }
* bucket sizes of 256, 128, 64, 32, 16, 8

In the .csv you will find a set of data in a table and
then a graph visualizing the table.

The table:
---------
There are as many rows as there are buckets. 
Three columns are represented in the form of:
ideal-value old-hash new-hash
as an example:
2 16 8
implies ideal value per bucket is 2, old hash value entered
16 (out of 256) values in that bucket; the new hash alg entered 8.

The graphs:
----------
The blue straight line represents the ideal value. This helps
visualize the deviation of the two hashes.
The yellow line represents the new hash
The pink/purple line represents the old hash.
The X axis represents the hash buckets and the Y axis the number in
the buckets out of 256 possible values.
Admittedly, it would have been easier to see the plot in a bar graph,
but
it seemed to require some craftsmanship that i didnt have patience for.

Visualizing
-----------
Pick one of the masks that dont behave well - example 0xfc. On each of
the sheets "zone" on 0xfc.
Then click progressively as the hash buckets change and you will make
the same observations as i did.

Observation:
-----------

For a length of X for the mask, then the ideal value of buckets
needed is 2^X. So in the case of 0xfc (length of 7), 128 buckets
suffice.

1)If more than 2^X were used, then both hashes wasted them.

2)If less than 2^X were used, then the new hash used them effectively,
but the old one wasted them still.

The old hash is never able to utilize all the buckets except
for the following cases:
a) mask length of 8 was used - this was the case in all hash bucket size
b) mask length of 4 in the case of 256 buckets.
#a and #b amount for about < 16% of all possible cases.
This is what is tagged as a bug.

3) The new hash always is several factors better in terms of worst
case lookups. In general it was a factor of 4 better.

Conclusion
----------

Other than fixing a bug, then new hash is at least equal to the old
hash in about 16% of the cases and better in the rest(>80% of the
time). 
This is true in the case of both memory abuse and worst case lookups.

I believe there is room for improvement but it has absolutely
nothing to do with the old hash (and for that matter not the new one
either, even though the new one is a lot better).

cheers,
jamal

j1-8b-allbuckets.csv.gz
Description: GNU Zip compressed data

Results WAS(Re: [PATCH] TC: bug fixes to the "sample" clause

Reply via email to