Re: [AArch64] Emit division using the Newton series

Evandro Menezes Fri, 01 Apr 2016 12:48:13 -0700

On 04/01/16 08:58, Wilco Dijkstra wrote:

Evandro Menezes wrote:
On 03/23/16 11:24, Evandro Menezes wrote:

On 03/17/16 15:09, Evandro Menezes wrote:

This patch implements FP division by an approximation using the Newton
series.


With this patch, DF division is sped up by over 100% and SF division,
zilch, both on A57 and on M1.

Mentioning throughput is not useful given that the vectorized single precision
case will give most of the speedup in actual code.

         gcc/
             * config/aarch64/aarch64-tuning-flags.def
             (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
             * config/aarch64/aarch64-protos.h
             (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
             (aarch64_emit_approx_div): Declare new function.
             * config/aarch64/aarch64.c
             (aarch64_emit_approx_div): Define new function.
             * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
             * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.


This version of the patch cleans up the changes to the MD files and
optimizes the division when the numerator is 1.0.

Adding support for plain recip is good. Having the enabling logic no longer in
the md file is an improvement, but I don't believe adding tuning flags for the 
inner
mode is correct - we need a more generic solution like I mentioned in my other 
mail.

The division variant should use the same latency reduction trick I mentioned 
for sqrt.


Wilco,

I don't think that it applies here, since it doesn't have to deal withspecial cases.

As for the finer grained flags, I'll wait for the feedback onhttps://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html


Thank you,

--
Evandro Menezes

Re: [AArch64] Emit division using the Newton series

Reply via email to