On Mon, 2006-20-03 at 14:46 +1000, Russell Stuart wrote: > On Sun, 2006-03-19 at 11:32 -0500, jamal wrote:
> From what I can see, you are not testing a "real" data > set here. Is that the case? Its a worst case scenario test setup - You lookup at all possible data values. Normally it is the best way to produce a metric for a classifier (actually i left out the best case scenario which would complete this - but it doesnt make a difference for us). > If so I am not sure how to > respond to this, other than to say it is possible to > construct datasets that show either hash (2.4 or 2.6) to > be better. I have to say i am scratching my head - now that i was forced to run the tests - to see if there is infact a scenario where you could show 2.4 to be better... > I am not sure if you understand what I mean > when I say this, but perhaps my data analysis will make > it clearer where I am coming from. > I cant access your page - it is timing out. However, again, as i stated in my email after this (yesterday after looking at the 16 bit masks); i dont see a scenario where the 2.4.x could ever produce something better than 2.6.x in terms of lookup + memory use. > By the by, I feel more comfortable with metrics than > graphs. Once you agree on them they take much less time > to discuss: either the single number is better, or it > isn't. The "Norm" metric I introduced in the analysis > is meant to be a crude measure of the actual runtime > of the hash. I am hoping you will be more comfortable > with it than the least squares one I proposed earlier. > The least squares is not bad - only issue is it missed the criteria i had for looking at the number of buckets selected. I think if we had no choice on changing the hash table, it would have been perfect. > On Sun, 2006-03-19 at 12:28 -0500, jamal wrote: > > I am not so certain now you can do better than the current algorithm in > > optimizing for lookup as well as memory use. > > The key is for the user to look at the mask length used and conclude on > > what the divisor should be. > > The "new" algorithm can emulate 2.4 or 2.6 "at will" > through a suitable choice of mask. If there are cases > where 2.4 hash will run faster, then it will always > be a better choice than either 2.4 or 2.6. If there > is no time when 2.4 runs faster, then we should stick > with the 2.6 algorithm. > I dont think there is ever a time where 2.4.x does better; there are times where it does equal. From what i saw in those tests (and probably not emphasized enough), the issue is in the "possible values" of a selected mask. If you ever had data-values which will end up using less than 256 buckets (eg in the case of 0xfe0000,0xfc00 or 0xf8) then the 2.4.x will never be better - ever. Under normal circumstances they will be equal. > As I said in earlier post, it is possible to come up > with fake datasets that make 2.4 run faster than 2.6. > If you want one just ask. I would love to see one. > In my mind that should > settle the issue as there is potential for any real > dataset to approximate the fake one, but as the > datasets are obviously fake and you don't accept my > arguments about random data, perhaps it doesn't. My > analysis presents a real live data set where 2.4 is > faster - well I think it does anyway. > The data randomness doesnt matter. If i was forced to use a mask of 0xf8, 0xf800 etc, there is no way this random data is going to help me suddenly have better distribution. And under normal circumstances, there will be no difference between the two (random or not). The only thing randomness would do is favor some buckets over others - but that would be very specific to an environment and if you did move to a different environment, different buckets will be favored etc. So you dont use that to judge a classifier. > One other point which occurred to me while I was hacking > away on the weekend. As the new hash emulates the 2.4 > hash exactly for any sane choice of mask (ie one where > the lsb of the mask falls on a byte boundary), it is > close to being backwards compatible with the old one. > For example, had the 2.6 kernel used the new hash, and > tc still computed the old one - I would never of noticed. > The new hash also emulates the current 2.6 hash for any > sane choice of mask - ie one no larger than 8 bits. > Again that means the change would be almost invisible to > current 2.6 hash users - even you. Ok now, give it up Russell ;-> We are not just gonna add a couple new instructions in the datapath for an obscure feature that more than likely is used by less than a handful of people. I certainly would not agree to it as i take performance very seriously. The backward compatibility (for people using 256 buckets, 0xff) already exists. Unless you have some data where 2.6.x looks worse we could use the time to discuss more productive things (perhaps related to u32) instead and move on away from this topic. OTOH, I dont think this discussion was a waste of time because it forced (me at least) to re-validate the reasons i made the hash change on the hash. Lets move on. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html