On Wed, Mar 07, 2007 at 07:51:35PM -0800, David Miller wrote: > From: Stephen Hemminger <[EMAIL PROTECTED]> > Date: Wed, 07 Mar 2007 19:10:47 -0800 > > > David Miller wrote: > > > What about Willy Tarreau's supposedly even faster variant? > > > Or does this incorporate that set of improvements? > > > > > That's what this is: > > x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3; > > Great, thanks for the clarification.
Oh BTW, I have a newer version with a first approximation of the cbrt() before the div64_64, which allows us to reduce from 3 div64 to only 2 div64. This results in a version which is twice as fast as the initial one (ncubic), but with slightly less accuracy (0.286% compared to 0.247). But I see that other functions such as hcbrt() had a 1.5% avg error, so I think this is not dramatic. Also, I managed to remove all other divides, to be kind with CPUs having a slow divide instruction or no divide at all. Since we compute on limited range (22 bits), we can multiply then shift right. It shows me even slightly better time on pentium-m and athlon, with a slightly higher avg error (0.297% compared to 0.286%), and slightly smaller code. I just have to clean experiments from my code to provide a patch. David, Stephen, are you interested ? $ ./bictcp fls(0)=0, fls(1)=1, fls(256)=9 Calibrating Function clocks mean(us) max(us) std(us) Avg error bictcp 936 0.61 24.28 1.99 0.172% ocubic 886 0.57 23.51 3.18 0.274% ncubic 644 0.42 16.59 2.18 0.247% ncubic32 444 0.29 11.47 1.50 0.247% ncubic32_1 444 0.29 11.56 1.88 0.238% ncubic32b3 337 0.22 8.67 0.88 0.286% ncubic_ndiv3 329 0.21 8.46 0.69 0.297% acbrt 707 0.46 18.05 0.80 0.275% hcbrt 644 0.42 16.44 0.51 1.580% Regards, Willy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html