> On 04/04/16 14:06, Evandro Menezes wrote: > > On 04/01/16 17:52, Evandro Menezes wrote: > >> On 04/01/16 17:45, Wilco Dijkstra wrote: > >>> Evandro Menezes wrote: > >>> > >>>> However, I don't think that there's the need to handle any special > >>>> case for division. The only case when the approximation differs > >>>> from division is when the numerator is infinity and the > >>>> denominator, zero, when the approximation returns infinity and the > >>>> division, NAN. So I don't think that it's a special case that > >>>> deserves being handled. > >>>> IOW, > >>>> the result of the approximate reciprocal is always needed. > >>> No, the result of the approximate reciprocal is not needed. > >>> > >>> Basically a NR approximation produces a correction factor that is > >>> very close to 1.0, and then multiplies that with the previous > >>> estimate to get a more accurate estimate. The final calculation for > >>> x * recip(y) is: > >>> > >>> result = (reciprocal_correction * reciprocal_estimate) * x > >>> > >>> while what I am suggesting is a trivial reassociation: > >>> > >>> result = reciprocal_correction * (reciprocal_estimate * x) > >>> > >>> The computation of the final reciprocal_correction is on the > >>> critical latency path, while reciprocal_estimate is computed > >>> earlier, so we can compute (reciprocal_estimate * x) without > >>> increasing the overall latency. > >>> Ie. we saved > >>> a multiply. > >>> > >>> In principle this could be done as a separate optimization pass that > >>> tries to reassociate to reduce latency. However I'm not too > >>> convinced this would be easy to implement in GCC's scheduler, so > >>> it's best to do it explicitly. > >> > >> I think that I see what you mean. I'll hack something tomorrow. > > > > [AArch64] Emit division using the Newton series > > > > 2016-04-04 Evandro Menezes <e.mene...@samsung.com> > > Wilco Dijkstra <wilco.dijks...@arm.com> > > > > gcc/ > > * config/aarch64/aarch64-tuning-flags.def > > * config/aarch64/aarch64-protos.h > > (AARCH64_APPROX_MODE): New macro. > > (AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: > > New tuning macros. > > (tune_params): Add new member "approx_div_modes". > > (aarch64_emit_approx_div): Declare new function. > > * config/aarch64/aarch64.c > > (generic_tunings): New member "approx_div_modes". > > (cortexa35_tunings): Likewise. > > (cortexa53_tunings): Likewise. > > (cortexa57_tunings): Likewise. > > (cortexa72_tunings): Likewise. > > (exynosm1_tunings): Likewise. > > (thunderx_tunings): Likewise. > > (xgene1_tunings): Likewise. > > (aarch64_emit_approx_div): Define new function. > > * config/aarch64/aarch64.md ("div<mode>3"): New expansion. > > * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise. > > * config/aarch64/aarch64.opt (-mlow-precision-div): Add new > > option. > > * doc/invoke.texi (-mlow-precision-div): Describe new option. > > > > > > This version of the patch has a shorter dependency chain at the last > > iteration of the series. > > Ping^1
Ping^2 -- Evandro Menezes Austin, TX