On 12/08/15 15:35, Evandro Menezes wrote:
Emit square root using the Newton series
2015-12-03 Evandro Menezes <e.mene...@samsung.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt<mode>2): New
expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
new function.
* config/aarch64/aarch64.md (sqrt<mode>2): New expansion
and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
This patch extends the patch that added support for implementing
x^-1/2 using the Newton series by adding support for x^1/2 as well.
Is it OK at this point of stage 3?
Thank you,
James,
As I was saying, this patch results in some validation errors in CPU2000
benchmarks using DF. Although proving the algorithm to be pretty solid
with a vast set of random values, I'm confused why some benchmarks fail
to validate with this implementation of the Newton series for square
root too, when they pass with the Newton series for reciprocal square root.
Since I had no problems with the same algorithm on x86-64, I wonder if
the initial estimate on AArch64, which offers just 8 bits, whereas
x86-64 offers 11 bits, has to do with it. Then again, the algorithm
iterated 1 less time on x86-64 than on AArch64.
Since it seems that the initial estimate is sufficient for CPU2000 to
validate when using SF, I'm leaning towards restricting the Newton
series for square root only for SF.
Your thoughts on the matter are appreciated,
--
Evandro Menezes