On 12/10/15 04:30, Kyrill Tkachov wrote:
On 09/12/15 18:50, Evandro Menezes wrote:
On 12/09/2015 11:16 AM, Kyrill Tkachov wrote:
On 09/12/15 17:02, Kyrill Tkachov wrote:
On 09/12/15 16:59, Evandro Menezes wrote:
On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:
Hi Evandro,
On 08/12/15 21:35, Evandro Menezes wrote:
Emit square root using the Newton series
2015-12-03 Evandro Menezes <e.mene...@samsung.com>
gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_swsqrt):
Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt<mode>2): New
expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt):
Define
new function.
* config/aarch64/aarch64.md (sqrt<mode>2): New
expansion
and insn
definitions.
* config/aarch64/aarch64.opt
(mlow-precision-recip-sqrt):
Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt):
Likewise.
This patch extends the patch that added support for implementing
x^-1/2 using the Newton series by adding support for x^1/2 as well.
Is it OK at this point of stage 3?
A comment on the patch itself from me...
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@
AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)
That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not
faster
than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something
along those lines
is more appropriate.
Unfortunately, this is the case on Exynos M1: the series is faster
than the instruction. :-( So, other targets when this is also
true, using the "fast_sqrt" option might make sense.
Sure, but the way your patch is written, we will emit the series
when "fast_sqrt" is set, rather
than the other way around, unless I'm misreading the logic in:
Sorry, what I meant to say is it would be clearer, IMO, to describe
the compiler action that is being taken
(e.g. the rename_fma_regs tuning flag), in this case it's estimating
sqrt using a series.
Kyrill
diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index 030a101..f6d2da4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4280,7 +4280,23 @@
;; sqrt
-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+ [(set (match_operand:VDQF 0 "register_operand")
+ (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+ "TARGET_SIMD"
+{
+ if ((AARCH64_EXTRA_TUNE_FAST_SQRT &
aarch64_tune_params.extra_tuning_flags)
+ && !optimize_function_for_size_p (cfun)
+ && flag_finite_math_only
+ && !flag_trapping_math
+ && flag_unsafe_math_optimizations)
+ {
+ aarch64_emit_swsqrt (operands[0], operands[1]);
+ DONE;
+ }
+})
Kyrill,
How about "approx_sqrt" for, you guessed it, approximate square
root? The same adjective would perhaps describe "recip_sqrt" better
too.
Sounds good to me.
Sorry for the bikeshedding.
Rename the reciprocal square root tuning flag
Rename the tuning option to enable the Newton series for the
reciprocal square
root to reflect its approximative characteristic.
2016-02-22 Evandro Menezes <e.mene...@samsung.com>
gcc/
* config/aarch64/aarch64-tuning-flags.def: Rename tuning
flag to
AARCH64_EXTRA_TUNE_APPROX_RSQRT.
* config/aarch64/aarch64.c (xgene1_tunings): Use new name.
(use_rsqrt_p): Likewise.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
Reword the
text explaining this option.
* doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise.
In preparation for the patch adding the Newton series also for square
root, I'd like to propose this patch changing the name of the existing
tuning flag for the reciprocal square root.
Thank you,
--
Evandro Menezes
>From 7043444f83c12de0ab50627a8b386e3070050591 Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.mene...@samsung.com>
Date: Mon, 22 Feb 2016 17:49:09 -0600
Subject: [PATCH] Rename the reciprocal square root tuning flag
Rename the tuning option to enable the Newton series for the reciprocal square
root to reflect its approximative characteristic.
2016-02-22 Evandro Menezes <e.mene...@samsung.com>
gcc/
* config/aarch64/aarch64-tuning-flags.def: Rename tuning flag to
AARCH64_EXTRA_TUNE_APPROX_RSQRT.
* config/aarch64/aarch64.c (xgene1_tunings): Use new name.
(use_rsqrt_p): Likewise.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Reword the
text explaining this option.
* doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise.
---
gcc/config/aarch64/aarch64-tuning-flags.def | 2 +-
gcc/config/aarch64/aarch64.c | 4 ++--
gcc/config/aarch64/aarch64.opt | 2 +-
gcc/doc/invoke.texi | 10 ++++------
4 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 8036cfe..7e45a0c 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,5 @@
AARCH64_TUNE_ to give an enum name. */
AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 923a4b3..ebf47da 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -586,7 +586,7 @@ static const struct tune_params xgene1_tunings =
0, /* max_case_values. */
0, /* cache_line_size. */
tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
- (AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */
+ (AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags. */
};
/* Support for fine-grained override of the tuning structures. */
@@ -7469,7 +7469,7 @@ use_rsqrt_p (void)
return (!flag_trapping_math
&& flag_unsafe_math_optimizations
&& ((aarch64_tune_params.extra_tuning_flags
- & AARCH64_EXTRA_TUNE_RECIP_SQRT)
+ & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
|| flag_mrecip_low_precision_sqrt));
}
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 5cbd4cd..155d2bd 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -151,5 +151,5 @@ PC relative literal loads.
mlow-precision-recip-sqrt
Common Var(flag_mrecip_low_precision_sqrt) Optimization
-When calculating a sqrt approximation, run fewer steps.
+Calculate the reciprocal square-root approximation in fewer steps.
This reduces precision, but can result in faster computation.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 490df93..eeff24d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12879,12 +12879,10 @@ corresponding flag to the linker.
@item -mno-low-precision-recip-sqrt
@opindex -mlow-precision-recip-sqrt
@opindex -mno-low-precision-recip-sqrt
-The square root estimate uses two steps instead of three for double-precision,
-and one step instead of two for single-precision.
-Thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} activates
-reciprocal square root estimate instructions.
-Which in turn depends on the target processor.
+The reciprocal square root approximation uses one step less than otherwise,
+thus reducing latency and precision.
+This is only relevant if @option{-ffast-math} enables the reciprocal square root
+approximation, which in turn depends on the target processor.
@item -march=@var{name}
@opindex march
--
2.6.3