Re: [AArch64] Emit division using the Newton series

Evandro Menezes Tue, 12 Apr 2016 11:16:55 -0700

On 04/04/16 14:06, Evandro Menezes wrote:

On 04/01/16 17:52, Evandro Menezes wrote:

On 04/01/16 17:45, Wilco Dijkstra wrote:
Evandro Menezes wrote:
However, I don't think that there's the need to handle any specialcase
for division.  The only case when the approximation differs from
division is when the numerator is infinity and the denominator, zero,
when the approximation returns infinity and the division, NAN.  So I
don't think that it's a special case that deserves being handled.IOW,
the result of the approximate reciprocal is always needed.
  No, the result of the approximate reciprocal is not needed.
Basically a NR approximation produces a correction factor that isvery closeto 1.0, and then multiplies that with the previous estimate to get amore
accurate estimate. The final calculation for x * recip(y) is:

result = (reciprocal_correction * reciprocal_estimate) * x

while what I am suggesting is a trivial reassociation:

result = reciprocal_correction * (reciprocal_estimate * x)
The computation of the final reciprocal_correction is on thecritical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency.Ie. we saved
a multiply.
In principle this could be done as a separate optimization pass thattries toreassociate to reduce latency. However I'm not too convinced thiswould be
easy to implement in GCC's scheduler, so it's best to do it explicitly.
I think that I see what you mean.  I'll hack something tomorrow.


   [AArch64] Emit division using the Newton series

   2016-04-04  Evandro Menezes  <e.mene...@samsung.com>
                Wilco Dijkstra <wilco.dijks...@arm.com>

   gcc/
            * config/aarch64/aarch64-tuning-flags.def
            * config/aarch64/aarch64-protos.h
            (AARCH64_APPROX_MODE): New macro.
(AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
            New tuning macros.
            (tune_params): Add new member "approx_div_modes".
            (aarch64_emit_approx_div): Declare new function.
            * config/aarch64/aarch64.c
            (generic_tunings): New member "approx_div_modes".
            (cortexa35_tunings): Likewise.
            (cortexa53_tunings): Likewise.
            (cortexa57_tunings): Likewise.
            (cortexa72_tunings): Likewise.
            (exynosm1_tunings): Likewise.
            (thunderx_tunings): Likewise.
            (xgene1_tunings): Likewise.
            (aarch64_emit_approx_div): Define new function.
            * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
            * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
            * config/aarch64/aarch64.opt (-mlow-precision-div): Add new
   option.
            * doc/invoke.texi (-mlow-precision-div): Describe new option.

This version of the patch has a shorter dependency chain at the lastiteration of the series.


Ping^1

--
Evandro Menezes

Re: [AArch64] Emit division using the Newton series

Reply via email to