Clarification: this and my first post assume familiarity with choose_multiplier.

lgup=ceiling(log2(divisor))

Currently choose_multiplier initializes mlow, mhigh at post_shift=lgup;
initially 2^n<=mlow<mhigh so we must use "wide int" in the reduction iteration,
which is relatively slow because using "wide int" or a similar extended integer
type seems slow compared to using fixed width integer types.

But choose_multiplier_v2 initializes at post_shift=lgup-1, so mlow<=mhigh<2^n
and we can now avoid "wide int" in the reduction iteration. We can also
calculate mhigh from mlow without using "wide int", so we can limit using
"wide int" to calculating the initial value of mlow. Depending on the results
of benchmarking, it may even be that using choose_multiplier_v4 which completely
avoids "wide int" is faster than choose_multiplier_v2.

If for the initial values of mlow, mhigh we have mlow<mhigh then we can try
reducing mhigh. But if mlow>=mhigh then we need post_shift=lgup and we need
to use an extra bit for the multiplier, signified by returning 1 instead of 0.

Reply via email to