Re: [AArch64] Emit square root using the Newton series

Evandro Menezes Tue, 08 Mar 2016 14:19:25 -0800

On 03/08/16 16:08, Evandro Menezes wrote:

On 02/16/16 14:56, Evandro Menezes wrote:
On 12/08/15 15:35, Evandro Menezes wrote:
Emit square root using the Newton series
   2015-12-03  Evandro Menezes  <e.mene...@samsung.com>

   gcc/
            * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
            function.
            * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
   expansion and
            insn definitions.
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
            * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
            * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
   and insn
            definitions.
            * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
            description.
            * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
This patch extends the patch that added support for implementingx^-1/2 using the Newton series by adding support for x^1/2 as well.
Is it OK at this point of stage 3?

Thank you,
James,
As I was saying, this patch results in some validation errors inCPU2000 benchmarks using DF. Although proving the algorithm to bepretty solid with a vast set of random values, I'm confused why somebenchmarks fail to validate with this implementation of the Newtonseries for square root too, when they pass with the Newton series forreciprocal square root.
Since I had no problems with the same algorithm on x86-64, I wonderif the initial estimate on AArch64, which offers just 8 bits, whereasx86-64 offers 11 bits, has to do with it. Then again, the algorithmiterated 1 less time on x86-64 than on AArch64.
Since it seems that the initial estimate is sufficient for CPU2000 tovalidate when using SF, I'm leaning towards restricting the Newtonseries for square root only for SF.
Your thoughts on the matter are appreciated,
        Add choices for the reciprocal square root approximation

        Allow a target to prefer such operation depending on the FP
   precision.

        gcc/
            * config/aarch64/aarch64-protos.h
            (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
            (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
            * config/aarch64/aarch64.c
            (use_rsqrt_p): New argument for the mode.
            (aarch64_builtin_reciprocal): Devise mode from builtin.
            (aarch64_optab_supported_p): New argument for the mode.


        Emit square root using the Newton series

        gcc/
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros.
            * config/aarch64/aarch64-protos.h
            (aarch64_emit_approx_sqrt): Declare new function.
            * config/aarch64/aarch64.c
            (aarch64_emit_approx_sqrt): Define new function.
            * config/aarch64/aarch64.md
            (sqrt*2): New expansion and insn definitions.
            * config/aarch64/aarch64-simd.md (sqrt*2): Likewise.
            * config/aarch64/aarch64.opt
            (mlow-precision-recip-sqrt): Expand option description.
            * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch, which depends onhttps://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages thereciprocal square root approximation to emit a faster square rootapproximation.

I have however encountered precision issues with DF, namely somebenchmarks in the SPECfp CPU2000 suite would fail to validate. Perhapsthe initial estimate, with just 8 bits, is not good enough for theseries to converge given the workloads of such benchmarks; perhapsdenormals, known to occur in some of these benchmarks, result inerrors. This was the motivation to split the tuning flags between onespecific for DF and the other, for SF in the previous related patch.


Again, your feedback is appreciated.

Thank you,

--
Evandro Menezes

Re: [AArch64] Emit square root using the Newton series

Reply via email to