On 04/01/16 08:58, Wilco Dijkstra wrote:
Evandro Menezes wrote:
On 03/23/16 11:24, Evandro Menezes wrote:
On 03/17/16 15:09, Evandro Menezes wrote:
This patch implements FP division by an approximation using the Newton
series.
With this patch, DF division is sped up by over 100% and SF division,
zilch, both on A57 and on M1.
Mentioning throughput is not useful given that the vectorized single precision
case will give most of the speedup in actual code.
gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
(aarch64_emit_approx_div): Declare new function.
* config/aarch64/aarch64.c
(aarch64_emit_approx_div): Define new function.
* config/aarch64/aarch64.md ("div<mode>3"): New expansion.
* config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
This version of the patch cleans up the changes to the MD files and
optimizes the division when the numerator is 1.0.
Adding support for plain recip is good. Having the enabling logic no longer in
the md file is an improvement, but I don't believe adding tuning flags for the
inner
mode is correct - we need a more generic solution like I mentioned in my other
mail.
The division variant should use the same latency reduction trick I mentioned
for sqrt.
Wilco,
I don't think that it applies here, since it doesn't have to deal with
special cases.
As for the finer grained flags, I'll wait for the feedback on
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html
Thank you,
--
Evandro Menezes