Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

Charles Baylis Fri, 31 Jul 2015 05:24:13 -0700

On 31 July 2015 at 10:34, Ramana Radhakrishnan
<ramana.radhakrish...@foss.arm.com> wrote:
> I've tried this in the past and never been convinced that 2 iterations are 
> enough to get to stability with this given that the results are only precise 
> for 8 bits / iteration. Thus I've always believed you need 3 iterations 
> rather than 2 at which point I've never been sure that it's worth it. So the 
> testing that you've done with this currently is not enough for this to go 
> into the tree.


My understanding is that 2 iterations is sufficient for single
precision floating point (although not for double precision), because
each iteration of Newton-Raphson doubles the number of bits of
accuracy.

I haven't worked through the maths myself, but
    
https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division
says
    "This squaring of the error at each iteration step — the so-called
    quadratic convergence of Newton–Raphson's method — has the
    effect that the number of correct digits in the result roughly
    doubles for every iteration, a property that becomes extremely
    valuable when the numbers involved have many digits"

Therefore:
vrecpe -> 8 bits of accuracy
+1 iteration -> 16 bits of accuracy
+2 iterations -> 32 bits of accuracy (but in reality limited to
precision of 32bit float)

Since 32 bits is much more accuracy than the 24 bits of precision in a
single precision FP value, 2 iterations should be sufficient.

> I'd like this to be tested on a couple of different AArch32 implementations 
> with a wider range of inputs to verify that the results are acceptable as 
> well as running something like SPEC2k(6) with atleast one iteration to ensure 
> correctness.

I can't argue with confirming theory matches practice :)

Some corner cases (eg numbers around FLT_MAX, FLT_MIN etc) may result
in denormals or out of range values during the reciprocal calculation
which could result in answers which are less accurate than the typical
case but I think that is acceptable with -ffast-math.

Charles

Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

Reply via email to