在 2023/11/29 上午10:33, Xi Ruoyao 写道:
On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:
在 2023/11/29 上午10:08, Xi Ruoyao 写道:
On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
     (ior (match_operand 0 "const_1_operand")
          (match_operand 0 "register_operand")))
+(define_predicate "reg_or_vecotr_1_operand"
"vector" instead of "vecotr".

+  (ior (match_operand 0 "const_vector_1_operand")
+       (match_operand 0 "register_operand")))
+@opindex mrecip
+@item -mrecip
+This option enables use of the reciprocal estimate and reciprocal square
+root estimate instructions with additional Newton-Raphson steps to increase
+precision instead of doing a divide or square root and divide for
+floating-point arguments.
+These instructions are generated only when @option{-funsafe-math-optimizations}
+is enabled together with @option{-ffinite-math-only} and
+@option{-fno-trapping-math}.
+Note that while the throughput of the sequence is higher than the throughput of
+the non-reciprocal instruction, the precision of the sequence can be decreased
+by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
+
+@opindex mrecip=opt
We should document that using these options requires the target CPU to
support the frecipe/frsqrte instructions.

I am currently improving this patch by adding option -mfrecipe to ensure
that the target CPU supports approximate instructions
You just need to add a line into gcc/config/loongarch/genopts/isa-
evolution.in:

2       25      frecipe         Support frecipe.{s/d} and frsqrte.{s/d} 
instuctions

Then the -mfrecipe option will be added and can be tested with
TARGET_FRECIPE in GCC code.  -march=native will also detect it properly
because the cpucfg info is included.  Then just add
OPTION_MASK_ISA_FRECIPE into ISA_BASE_LA64V110_FEATURES in loongarch-
cpu.cc.

I'm now implementing it according to the idea you mentioned. Yesterday, lulu informed me of this problem.
And could we have a __builtin for scalar frecipe/frsqrte too?  Then if
the approximation is not OK for the entire program, but the programmer
knows it's OK for some operations in a hot path, (s)he can code
__builtin_loongarch_frecipe_d (x) for an acceleration.

I agree with this suggestion.

Reply via email to