On Tuesday 01 August 2006 00:34, Jim Wilson wrote: > Denis Vlasenko wrote: > > I still cannot figure out what precision is, so I restricted new code to > > (n == HOST_BITS_PER_WIDE_INT && precision == HOST_BITS_PER_WIDE_INT) case. > > Need help here. > > At the moment, there is probably no one who understands this code as > well as you do, so you may not get much help from others. > > I looked at this a little, and I think precision is the number of > significant bits in the divisor. Note that unsigned divides use N for > precision, but signed divides use N-1. Also, in the pre_shift case, > where we shift the divisors right if they have zeros in the low bits, we > subtract the shift count from the precision.
Yes. > This probably has some effect on how many bits we can safely use for the > resulting inverse multiplier. > > This code is based on a paper writte by Torbjorn Granlund and Peter > Montgomery. This was published in the 1994 SIGPLAN PLDI conference > proceedings. You should read this paper if you haven't already done so. > There may be clues in there about how the gcc algorithm works. The > gcc code was written by one of the co-authors, Torbjorn Granlund. I read it. I uploaded newer patch to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417. My code basically differs in 1) I never do 33-bit math. I do 32-bit and if the best 32-bit value of K for multiplier still doesn't work (there exist such 32-bit A that A*K/L != A/B) then I return K*2-1 (which is a 33-bit value) _without_ ever checking that it works (because it works always because of math stuff (read the paper)). Old code calculates 33-bit K (it calls it "m", not K) and needs to jumt thru hoops due to 33-bitness. 2) I do not do mhigh/mlow stuff. This is the core of my improvement. I directly calculate the smallest bad_A for which bad_A*K/L != A/B. For some values of B, it turns out that bad_A is > 2^32-1 even though mhigh/mlow method says that K is maybe bad (i.e. my method proves that this K is actually okay). It also allows for finding better K values if we know that A is bounded (from value range analysis), but current patch doesn't use it yet. My algorithm's comment (taken from patch): +/* +[below: 'div' is unsigned integer division] +['/' is real division with infinite precision] +[A,B,C... - integers, a,b,c... - reals] + +Theory: we want to calculate A div B, fast. +B is const > 2 and is not a power of 2. + +In fp: A/B = A*(L/B)/L. (If L is a large power of 2, +say 2^32, then it could be done really fast). +Let k := L/B, K := L div B + 1. +Then A/B = A*k/L. + +Then this is true: + +A div B <= A * K div L. + +For small enough A: A div B = A * K div L. +Lets find first A for which it is not true. + +Lets compare k/L and K/L. K/L is larger by a small value d: + +d := K/L - k/L = (L div B + 1) / L - L/B/L = += (L div B * B + B) / (L*B) - L/(L*B) = += (L div B * B + B - L) / (L*B) + +A*K/L is larger than A*k/L by A*d. + +A*k/L is closest to 'overflowing into next integer' +when A = N*B-1. The 'deficit' with such A is exactly 1/B. +If A*d >= 1/B, then A*K/L will 'overflow'. + +Thus bad_A >= 1/B / d = (1/B) * (L*B)/(L div B * B + B - L) = += L/(L div B * B + B - L). Then you need to 'walk up' to next +A representable as N*B-1: bad_A2 = (bad_A div B) * B + B-1 +This is the first A which will have wrong result +(i.e. for which (A*K div L) = (A div B)+1, not (A div B). -- vda