On 04/04/16 14:06, Evandro Menezes wrote:
On 04/01/16 17:52, Evandro Menezes wrote:
On 04/01/16 17:45, Wilco Dijkstra wrote:
Evandro Menezes wrote:

However, I don't think that there's the need to handle any special case
for division.  The only case when the approximation differs from
division is when the numerator is infinity and the denominator, zero,
when the approximation returns infinity and the division, NAN.  So I
don't think that it's a special case that deserves being handled. IOW,
the result of the approximate reciprocal is always needed.
  No, the result of the approximate reciprocal is not needed.

Basically a NR approximation produces a correction factor that is very close to 1.0, and then multiplies that with the previous estimate to get a more
accurate estimate. The final calculation for x * recip(y) is:

result = (reciprocal_correction * reciprocal_estimate) * x

while what I am suggesting is a trivial reassociation:

result = reciprocal_correction * (reciprocal_estimate * x)

The computation of the final reciprocal_correction is on the critical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency. Ie. we saved
a multiply.

In principle this could be done as a separate optimization pass that tries to reassociate to reduce latency. However I'm not too convinced this would be
easy to implement in GCC's scheduler, so it's best to do it explicitly.

I think that I see what you mean.  I'll hack something tomorrow.

   [AArch64] Emit division using the Newton series

   2016-04-04  Evandro Menezes  <e.mene...@samsung.com>
                Wilco Dijkstra <wilco.dijks...@arm.com>

   gcc/
            * config/aarch64/aarch64-tuning-flags.def
            * config/aarch64/aarch64-protos.h
            (AARCH64_APPROX_MODE): New macro.
(AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
            New tuning macros.
            (tune_params): Add new member "approx_div_modes".
            (aarch64_emit_approx_div): Declare new function.
            * config/aarch64/aarch64.c
            (generic_tunings): New member "approx_div_modes".
            (cortexa35_tunings): Likewise.
            (cortexa53_tunings): Likewise.
            (cortexa57_tunings): Likewise.
            (cortexa72_tunings): Likewise.
            (exynosm1_tunings): Likewise.
            (thunderx_tunings): Likewise.
            (xgene1_tunings): Likewise.
            (aarch64_emit_approx_div): Define new function.
            * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
            * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
            * config/aarch64/aarch64.opt (-mlow-precision-div): Add new
   option.
            * doc/invoke.texi (-mlow-precision-div): Describe new option.


This version of the patch has a shorter dependency chain at the last iteration of the series.

Ping^1

--
Evandro Menezes

Reply via email to