Evandro Menezes wrote: On 03/23/16 11:24, Evandro Menezes wrote: > On 03/17/16 15:09, Evandro Menezes wrote: >> This patch implements FP division by an approximation using the Newton >> series. >> >> With this patch, DF division is sped up by over 100% and SF division, >> zilch, both on A57 and on M1.
Mentioning throughput is not useful given that the vectorized single precision case will give most of the speedup in actual code. > gcc/ > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros. > * config/aarch64/aarch64-protos.h > (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro. > (aarch64_emit_approx_div): Declare new function. > * config/aarch64/aarch64.c > (aarch64_emit_approx_div): Define new function. > * config/aarch64/aarch64.md ("div<mode>3"): New expansion. > * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise. > > > This version of the patch cleans up the changes to the MD files and > optimizes the division when the numerator is 1.0. Adding support for plain recip is good. Having the enabling logic no longer in the md file is an improvement, but I don't believe adding tuning flags for the inner mode is correct - we need a more generic solution like I mentioned in my other mail. The division variant should use the same latency reduction trick I mentioned for sqrt. Wilco