RFC: integer division by multiply with invariant reciprocal

Joern Rennecke Sun, 25 Mar 2007 07:13:18 -0800

The strategy that the SHMEDIA port uses to do cse and loop invariant code
motion for division by invariant reciprocal can in principle be used by
any processor that has a reasonable fast instructions for count leading
sign bits (can be substituted with count leading zero bits), widening
or highpart multiply, and dynamic shifts, and that typically allows
to add a few more pseudo registers without much cost (i.e. 32 GPRS,
or efficient stack access like on x86).


Basically, the division is split into an invariant computation
which takes the divisor as input, and computes a denormalization shift
count (SHIFT), an approximate reciprocal factor (INV1), and a reciprocal
adjustment factor (INV2).
The division is then performed by multiplying the INV2 with the dividend,
shifting it right by a constant amount, subtracting that from (or adding
to) the product of INV2 with the dividend, denormalize the result, and
do an adjustment to take account oif different rounding for signed /
unsigned results.
The details of how to best compute SHIFT, INV1 and INV2, if INV2
has the same or opposite signe of INV2, and the constant shift count
depend on the target instruction set and microarchitecture.

However, not all processors are as good as the SH5 in computing the
invariant values compared to how fast they can do a straight division;
thus instead of chopping the division into multiple RTL pieces and then
allowing any kind of CSE and PRE to rip them apart, we would like some
finer control that requires a minimum number of divisions with the same
divisor before this optimization is applied.
Also, exposing this at the tree level will enable other optimizations to
work better, e.g. they can make better unrolling decisions, and take
advantage of the non-trapping nature of the division by multiply with
reciprocal.

The question is now how best to respresent the invariant operations as
trees.
We could have a special tree code for this, but then
- It would eat up a tree code.
- Producing three results at once, it is likely to be trouble for
  ssa transformations.

Having a single built-in function for this purpose would avoid the tree
code cost, but the we'd be faced with a three-valued built-in function.

So I think that the easiest way to integrate this with the rest of the
compiler is to have a target hook that emits trees to compute SHIFT, INV1
and INV2.  These might use additional temporaries, and could use standard
arithmetic and/or memory operations, machine-specific builtin functions, or
a mixture of both; but any one tree expression would compute only one
value.
Emitting the computation of the actual division using SHIFT, INV1, INV2 is
then done by a second target hook.
The parameter for the minimum number of divisions with the same divisor
to trigger this optimization should be separate from the one for floating
point.

RFC: integer division by multiply with invariant reciprocal

Reply via email to