On 04/01/16 08:58, Wilco Dijkstra wrote:
Evandro Menezes wrote:
On 03/23/16 11:24, Evandro Menezes wrote:
On 03/17/16 15:09, Evandro Menezes wrote:
This patch implements FP division by an approximation using the Newton
series.

With this patch, DF division is sped up by over 100% and SF division,
zilch, both on A57 and on M1.
Mentioning throughput is not useful given that the vectorized single precision
case will give most of the speedup in actual code.

         gcc/
             * config/aarch64/aarch64-tuning-flags.def
             (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
             * config/aarch64/aarch64-protos.h
             (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
             (aarch64_emit_approx_div): Declare new function.
             * config/aarch64/aarch64.c
             (aarch64_emit_approx_div): Define new function.
             * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
             * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.


This version of the patch cleans up the changes to the MD files and
optimizes the division when the numerator is 1.0.
Adding support for plain recip is good. Having the enabling logic no longer in
the md file is an improvement, but I don't believe adding tuning flags for the 
inner
mode is correct - we need a more generic solution like I mentioned in my other 
mail.

The division variant should use the same latency reduction trick I mentioned 
for sqrt.

Wilco,

I don't think that it applies here, since it doesn't have to deal with special cases.

As for the finer grained flags, I'll wait for the feedback on https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html

Thank you,

--
Evandro Menezes

Reply via email to