Hi Denis, On Mon, 31 Jul 2006, Jim Wilson wrote: > At the moment, there is probably no one who understands this code as > well as you do, so you may not get much help from others.
In my defence, I'm struggling to get up to speed with all of the issues. The first and obvious comments are that patches should be submitted to [EMAIL PROTECTED] rather than gcc@gcc.gnu.org, we're currently in stage 3, which is a code freeze for regression fixes, and therefore missed optimization improvements such as yours can't be committed until gcc 4.2 has branched, which means reviewers tend not to review complex patches until then, and finally you really should take a look at GCC's coding standards documentation as the formatting of your submission is just all wrong. In the meantime, however, I have been running your example code, and it does indeed find better correct multipliers than the algorithm currently used by GCC. As mentioned by Jim, the current GCC algorithm is based upon the paper by Torbjorn Granlund and Peter Montgomery, "Division by Invariant Integers using Multiplication", PLDI-94. http://swox.se/~tg/divcnst-pldi94.pdf Another excellent reference describing division by integer constant algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley publishers, 2003. The strange thing is that all three of GCC, the paper and the book agree on the a preferred multiplier, which differs from the one selected by your code. Following through both sets of correctness proofs is tedious, but it looks like Granlund's work attempts to find multipliers whose residual error is less than 1/n, which is sufficient by not necessary. Your code relies on round to zero semantics of truncating division, so that provided the residual is less than 1, we'll always truncate to the correct result. One issue is that Grunlund's derivation starts with signed division and is then extended/tweaked to handle unsigned division. Your work on the other hand, starts its derivation with unsigned division. So currently I'm trying to find the catch. For which divisions are 33-bit operations required. Your proof looks reasonable and the code does the correct thing for the divisions in your examples, though the coding style issues prevent it being used in the compiler without being cleaned up. There's also a return clause where it gives up after failing to find a suitable multiplier, where falling back to the exisiting code might be a better compromise. I've not played with your algorithm enough to determine how often it triggers. Congratulations for the impressive work! You'll forgive me if it takes a while to understand why your method is better than previous works, i.e. what it is they overlooked (or that you've overlooked). To be honest all this discrete math makes my head hurt. Indeed, when its appreciated that its not the smallest multiplier that we'd prefer, but the cheapest (using shifts/adds etc...) and that choose_multiplier is potentially intertwined with synth_mult, the world starts spinning and I need to reach for the headache tablets. Roger --