http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417
On Thursday 03 August 2006 06:55, Roger Sayle wrote: > As mentioned by Jim, the current GCC algorithm is based upon the > paper by Torbjorn Granlund and Peter Montgomery, "Division by > Invariant Integers using Multiplication", PLDI-94. > http://swox.se/~tg/divcnst-pldi94.pdf > > Another excellent reference describing division by integer constant > algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley > publishers, 2003. > > The strange thing is that all three of GCC, the paper and the book > agree on the a preferred multiplier, which differs from the one > selected by your code. Following through both sets of correctness > proofs is tedious, but it looks like Granlund's work attempts to > find multipliers whose residual error is less than 1/n, which is > sufficient by not necessary. Your code relies on round to zero > semantics of truncating division, so that provided the residual is > less than 1, we'll always truncate to the correct result. Yes, it works exactly that way. > One issue is that Grunlund's derivation starts with signed division > and is then extended/tweaked to handle unsigned division. Your > work on the other hand, starts its derivation with unsigned division. In my pdf downloaded from http://swox.se/~tg/divcnst-pldi94.pdf Grunlund starts with unsigned division also. Actually, my code is not very different. Method of reducing K is different, and also I borrowed from that paper the knowledge that 33bit good K always exists, so I don't canculate it at all. I calculate best possible 32-bit K, if even that is bad, then known-good 33bit one is K*2-1. I can explain why it = K*2-1 if needed. Oh, and notation is somewhat different, I used A/B before I saw the paper, and paper uses d instead of B, etc... > So currently I'm trying to find the catch. For which divisions > are 33-bit operations required. Your proof looks reasonable and > the code does the correct thing for the divisions in your examples, > though the coding style issues prevent it being used in the compiler > without being cleaned up. Style is fixed now, new attachment is added to the bug page. > There's also a return clause where it gives > up after failing to find a suitable multiplier, where falling back > to the exisiting code might be a better compromise. I've not played > with your algorithm enough to determine how often it triggers. I don't see where does it do this. The code is below my sig, please reply and quote relevant part. > To be honest all this discrete math makes my head hurt. Indeed, > when its appreciated that its not the smallest multiplier that we'd > prefer, but the cheapest (using shifts/adds etc...) and that > choose_multiplier is potentially intertwined with synth_mult, the > world starts spinning and I need to reach for the headache tablets. If "goodness" of given multiplier K can be measured somehow, (say, we have mul_cost()), you need to add it here in my code: K++; + min_cost = mul_cost(K); best_K = K; best_lgdn = lgdn; ... ... /* K is good, but maybe we can reduce it? */ while(1) { last_K = K; if (--lgdn < 0) break; if (!high_A && bad_A <= max_A) break; + cost = mul_cost(K); + if(min_cost >= cost) min_cost = cost, best_K = K, best_lgdn = lgdn; } /* Report previous parameters because we current ones are bad */ - *multiplier_ptr = last_K; - *shift_ptr = lgdn+1; + *multiplier_ptr = best_K; + *shift_ptr = best_lgdn; return 0; /* 32-bit or less */ Something like it. BTW, the most improvement will be caused by propagating data about possible range of dividend into choose_multiplier(). Code will be able to pick smaller Ks. Right now, it assumes max_A = 0xffffffff or 0x80000000. > Roger > -- -- vda int gcc_fastdiv_params(unsigned HOST_WIDE_INT B, unsigned HOST_WIDE_INT max_A, unsigned HOST_WIDE_INT *multiplier_ptr, int *shift_ptr) { unsigned HOST_WIDE_INT K, last_K, d_LB, bad_A, udummy; HOST_WIDE_INT high_A, dummy; int lgdn; /* L/B should fit into uint, so max L is floor_log2(B) << HOST_BITS_PER_WIDE_INT ** L is impicitly = 1 << (lgdn+HOST_BITS_PER_WIDE_INT) */ lgdn = floor_log2(B); /* K = L/B + 1 */ div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, B,(HOST_WIDE_INT)0, &K,&dummy, &udummy,&dummy); K++; /* d_LB = ((L/B) * B + B - L) */ d_LB = K * B; /* since (HOST_WIDE_INT)L == 0, subtracting it can be optimized out */ /* bad_A = L / d_LB */ div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, d_LB,(HOST_WIDE_INT)0, &bad_A,&high_A, &udummy,&dummy); bad_A = (bad_A / B) * B; /* ... + B-1 later */ if(high_A || bad_A > max_HOST_WIDE_INT - (B-1) ) high_A = 1; /* we aren't interested much in true value if high_A!=0 */ else bad_A += B-1; if (!high_A && bad_A <= max_A) { /* K doesn't work for all required A's. ** However, there always exist ** (potentially HOST_BITS_PER_WIDE_INT+1 bit) K which works ** for all A's and it equals K*2-1. ** We return lower HOST_BITS_PER_WIDE_INT bits in *multiplier_ptr. */ *multiplier_ptr = K*2-1; *shift_ptr = lgdn+1; /* if K > 0x800..000 then K*2-1 overflows into 32th bit, ** and we need to return 33-bit value */ return K > (1<<(HOST_BITS_PER_WIDE_INT-1)); } /* There is not much point in multiplying by even number ** and then shifting right. Reduce K & L to avoid it: */ while (!(K & 1) && lgdn) lgdn--, K >>= 1; /* K is good, but maybe we can reduce it? */ while(1) { last_K = K; if (--lgdn < 0) break; /* K = L/B + 1 */ div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, B,(HOST_WIDE_INT)0, &K,&dummy, &udummy,&dummy); K++; /* d_LB = ((L/B) * B + B - L) */ d_LB = K * B; /* since (HOST_WIDE_INT)L == 0, subtracting it can be optimized out */ /* bad_A = L / d_LB */ div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, d_LB,(HOST_WIDE_INT)0, &bad_A,&high_A, &udummy,&dummy); bad_A = (bad_A / B) * B; if(high_A || bad_A > max_HOST_WIDE_INT - (B-1) ) high_A = 1; /* we aren't interested much in true value if high_A!=0 */ else bad_A += B-1; if (!high_A && bad_A <= max_A) break; } /* Report previous parameters because we current ones are bad */ *multiplier_ptr = last_K; *shift_ptr = lgdn+1; return 0; /* 32-bit or less */ }