On 04/04/16 14:06, Evandro Menezes wrote:
On 04/01/16 17:52, Evandro Menezes wrote:
On 04/01/16 17:45, Wilco Dijkstra wrote:
Evandro Menezes wrote:
However, I don't think that there's the need to handle any special
case
for division. The only case when the approximation differs from
division is when the numerator is infinity and the denominator, zero,
when the approximation returns infinity and the division, NAN. So I
don't think that it's a special case that deserves being handled.
IOW,
the result of the approximate reciprocal is always needed.
No, the result of the approximate reciprocal is not needed.
Basically a NR approximation produces a correction factor that is
very close
to 1.0, and then multiplies that with the previous estimate to get a
more
accurate estimate. The final calculation for x * recip(y) is:
result = (reciprocal_correction * reciprocal_estimate) * x
while what I am suggesting is a trivial reassociation:
result = reciprocal_correction * (reciprocal_estimate * x)
The computation of the final reciprocal_correction is on the
critical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency.
Ie. we saved
a multiply.
In principle this could be done as a separate optimization pass that
tries to
reassociate to reduce latency. However I'm not too convinced this
would be
easy to implement in GCC's scheduler, so it's best to do it explicitly.
I think that I see what you mean. I'll hack something tomorrow.
[AArch64] Emit division using the Newton series
2016-04-04 Evandro Menezes <e.mene...@samsung.com>
Wilco Dijkstra <wilco.dijks...@arm.com>
gcc/
* config/aarch64/aarch64-tuning-flags.def
* config/aarch64/aarch64-protos.h
(AARCH64_APPROX_MODE): New macro.
(AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
New tuning macros.
(tune_params): Add new member "approx_div_modes".
(aarch64_emit_approx_div): Declare new function.
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_div_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_emit_approx_div): Define new function.
* config/aarch64/aarch64.md ("div<mode>3"): New expansion.
* config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
* config/aarch64/aarch64.opt (-mlow-precision-div): Add new
option.
* doc/invoke.texi (-mlow-precision-div): Describe new option.
This version of the patch has a shorter dependency chain at the last
iteration of the series.
Ping^1
--
Evandro Menezes