date:20230505

Re: [PATCH][RFC] tree-optimization/104475 - bogus -Wstringop-overflow

2023-05-05 Thread Richard Biener via Gcc-patches

On Thu, 4 May 2023, Jason Merrill wrote:

> On 5/4/23 09:59, Richard Biener wrote:
> > 
> > I've previously sent
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608077.html
> > adding ADDR_EXPR_NONZERO and there were comments from Jason
> > where I just realized I ignored ARRAY_REF for the following.
> > Anyway, here's a more aggressive variant not going for an extra
> > flag set by the frontend but instead have the middle-end treat
> > all &*.component as non-NULL (all handled_component_p).
> > 
> > This passes bootstrap for all languages, testing there isn't
> > complete but it already shows for example
> > gcc.c-torture/execute/pr44555.c explicitely testing that
> > we keep &p->z NULL when p is NULL and z is at offset zero.
> > 
> > There's also execute FAILs for gfortran.dg/class_optional_2.f90
> > and some optimization dump scan fails I did not yet investigate.
> > 
> > Nevertheless I'd like to hear opinions on whether a middle-end
> > implementation without frontend help is the way to go and
> > what the reasonable restrictions should be there?  Is
> > gcc.c-torture/execute/pr44555.c sanctioned by the C standard?
> > If so I think we have a lost cause without some help from
> > the frontend?
> 
> The relevant C++ rule is https://eel.is/c++draft/expr.ref#8

OK, I can second-guess that the nullptr object doesn't have the
type specified by the pointer type but is more or less void
which would make subsetting invoke undefined behavior.

> The corresponding C clause doesn't have as explicit a rule that I can see, I
> don't know what the sense of the C committee is about this.  The special
> allowance for the common initial sequence suggests such that it is an
> exception to such a rule, but I'm not sure where that rule is, exactly.
> 
> I imagine that not all languages are as strict about this, so an unconditional
> rule like this may not be what we want.

Looks like that then.  The original proposal would make that
explicit on the ADDR_EXPR though I could imagine we could alternatively
put a flag on the COMPONENT_REF as well.

> And as I think I commented before, this kind of assumption based on undefined
> behavior ought to have a -fsanitize=undefined check.

Agreed.

I don't feel like poking on the C++ too much so I hope that eventually
somebody else will pick this up.

Thanks,
Richard.

[committed] builtins: Fix comment typo mpft_t -> mpfr_t

2023-05-05 Thread Jakub Jelinek via Gcc-patches

Hi!

I've noticed 4 typos in comments, fixed thusly.

Tested on x86_64-linux and i686-linux, committed to trunk as obvious.

2023-05-05  Jakub Jelinek  

* builtins.cc (do_mpfr_ckconv, do_mpc_ckconv): Fix comment typo,
mpft_t -> mpfr_t.
* fold-const-call.cc (do_mpfr_ckconv, do_mpc_ckconv): Likewise.

--- gcc/builtins.cc.jj  2023-04-28 08:32:47.943186874 +0200
+++ gcc/builtins.cc 2023-05-04 16:50:16.935259247 +0200
@@ -10911,7 +10911,7 @@ do_mpfr_ckconv (mpfr_srcptr m, tree type
   real_from_mpfr (&rr, m, type, MPFR_RNDN);
   /* Proceed iff GCC's REAL_VALUE_TYPE can hold the MPFR value,
 check for overflow/underflow.  If the REAL_VALUE_TYPE is zero
-but the mpft_t is not, then we underflowed in the
+but the mpfr_t is not, then we underflowed in the
 conversion.  */
   if (real_isfinite (&rr)
  && (rr.cl == rvc_zero) == (mpfr_zero_p (m) != 0))
@@ -10952,7 +10952,7 @@ do_mpc_ckconv (mpc_srcptr m, tree type,
   real_from_mpfr (&im, mpc_imagref (m), TREE_TYPE (type), MPFR_RNDN);
   /* Proceed iff GCC's REAL_VALUE_TYPE can hold the MPFR values,
 check for overflow/underflow.  If the REAL_VALUE_TYPE is zero
-but the mpft_t is not, then we underflowed in the
+but the mpfr_t is not, then we underflowed in the
 conversion.  */
   if (force_convert
  || (real_isfinite (&re) && real_isfinite (&im)
--- gcc/fold-const-call.cc.jj   2023-04-19 09:33:59.51592 +0200
+++ gcc/fold-const-call.cc  2023-05-04 16:50:33.736019179 +0200
@@ -101,7 +101,7 @@ do_mpfr_ckconv (real_value *result, mpfr
   real_from_mpfr (&tmp, m, format, MPFR_RNDN);
 
   /* Proceed iff GCC's REAL_VALUE_TYPE can hold the MPFR values.
- If the REAL_VALUE_TYPE is zero but the mpft_t is not, then we
+ If the REAL_VALUE_TYPE is zero but the mpfr_t is not, then we
  underflowed in the conversion.  */
   if (!real_isfinite (&tmp)
   || ((tmp.cl == rvc_zero) != (mpfr_zero_p (m) != 0)))
@@ -295,7 +295,7 @@ do_mpc_ckconv (real_value *result_real,
   real_from_mpfr (&tmp_imag, mpc_imagref (m), format, MPFR_RNDN);
 
   /* Proceed iff GCC's REAL_VALUE_TYPE can hold the MPFR values.
- If the REAL_VALUE_TYPE is zero but the mpft_t is not, then we
+ If the REAL_VALUE_TYPE is zero but the mpfr_t is not, then we
  underflowed in the conversion.  */
   if (!real_isfinite (&tmp_real)
   || !real_isfinite (&tmp_imag)

Jakub

[PATCH] gimple-range-op: Improve handling of sqrt ranges

2023-05-05 Thread Jakub Jelinek via Gcc-patches

Hi!

The previous patch just added basic intrinsic ranges for sqrt
([-0.0, +Inf] +-NAN being the general result range of the function
and [-0.0, +Inf] the general operand range if result isn't NAN etc.),
the following patch intersects those ranges with particular range
computed from argument or result's exact range with the expected
error in ulps taken into account and adds a function (frange_arithmetic
variant) which can be used by other functions as well as helper.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-05-05  Jakub Jelinek  

* value-range.h (frange_arithmetic): Declare.
* range-op-float.cc (frange_arithmetic): No longer static.
* gimple-range-op.cc (frange_mpfr_arg1): New function.
(cfn_sqrt::fold_range): Intersect the generic boundaries range
with range computed from sqrt of the particular bounds.
(cfn_sqrt::op1_range): Intersect the generic boundaries range
with range computed from squared particular bounds.

* gcc.dg/tree-ssa/range-sqrt-2.c: New test.

--- gcc/value-range.h.jj2023-05-04 13:34:45.140336099 +0200
+++ gcc/value-range.h   2023-05-04 16:28:18.286108178 +0200
@@ -1294,5 +1294,8 @@ frange::nan_signbit_p (bool &signbit) co
 
 void frange_nextafter (enum machine_mode, REAL_VALUE_TYPE &,
   const REAL_VALUE_TYPE &);
+void frange_arithmetic (enum tree_code, tree, REAL_VALUE_TYPE &,
+   const REAL_VALUE_TYPE &, const REAL_VALUE_TYPE &,
+   const REAL_VALUE_TYPE &);
 
 #endif // GCC_VALUE_RANGE_H
--- gcc/range-op-float.cc.jj2023-05-04 13:34:45.139336114 +0200
+++ gcc/range-op-float.cc   2023-05-04 16:28:18.285108192 +0200
@@ -305,7 +305,7 @@ frange_nextafter (enum machine_mode mode
 // SF/DFmode (when storing into memory from the 387 stack).  Maybe
 // this is ok as well though it is just occasionally more precise. ??
 
-static void
+void
 frange_arithmetic (enum tree_code code, tree type,
   REAL_VALUE_TYPE &result,
   const REAL_VALUE_TYPE &op1,
--- gcc/gimple-range-op.cc.jj   2023-05-04 13:34:45.139336114 +0200
+++ gcc/gimple-range-op.cc  2023-05-04 19:58:44.842606865 +0200
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
 #include "value-query.h"
 #include "gimple-range.h"
 #include "attr-fnspec.h"
+#include "realmpfr.h"
 
 // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
 // on the statement.  For efficiency, it is an error to not pass in enough
@@ -403,6 +404,66 @@ public:
   }
 } op_cfn_copysign;
 
+/* Compute FUNC (ARG) where FUNC is a mpfr function.  If RES_LOW is non-NULL,
+   set it to low bound of possible range if the function is expected to have
+   ULPS precision and similarly if RES_HIGH is non-NULL, set it to high bound.
+   If the function returns false, the results weren't set.  */
+
+static bool
+frange_mpfr_arg1 (REAL_VALUE_TYPE *res_low, REAL_VALUE_TYPE *res_high,
+ int (*func) (mpfr_ptr, mpfr_srcptr, mpfr_rnd_t),
+ const REAL_VALUE_TYPE &arg, tree type, unsigned ulps)
+{
+  if (ulps == ~0U || !real_isfinite (&arg))
+return false;
+  machine_mode mode = TYPE_MODE (type);
+  const real_format *format = REAL_MODE_FORMAT (mode);
+  auto_mpfr m (format->p);
+  mpfr_from_real (m, &arg, MPFR_RNDN);
+  mpfr_clear_flags ();
+  bool inexact = func (m, m, MPFR_RNDN);
+  if (!mpfr_number_p (m) || mpfr_overflow_p () || mpfr_underflow_p ())
+return false;
+
+  REAL_VALUE_TYPE value, result;
+  real_from_mpfr (&value, m, format, MPFR_RNDN);
+  if (!real_isfinite (&value))
+return false;
+  if ((value.cl == rvc_zero) != (mpfr_zero_p (m) != 0))
+inexact = true;
+
+  real_convert (&result, format, &value);
+  if (!real_isfinite (&result))
+return false;
+  bool round_low = false;
+  bool round_high = false;
+  if (!ulps && flag_rounding_math)
+++ulps;
+  if (inexact || !real_identical (&result, &value))
+{
+  if (MODE_COMPOSITE_P (mode))
+   round_low = round_high = true;
+  else
+   {
+ round_low = !real_less (&result, &value);
+ round_high = !real_less (&value, &result);
+   }
+}
+  if (res_low)
+{
+  *res_low = result;
+  for (unsigned int i = 0; i < ulps + round_low; ++i)
+   frange_nextafter (mode, *res_low, dconstninf);
+}
+  if (res_high)
+{
+  *res_high = result;
+  for (unsigned int i = 0; i < ulps + round_high; ++i)
+   frange_nextafter (mode, *res_high, dconstinf);
+}
+  return true;
+}
+
 class cfn_sqrt : public range_operator_float
 {
 public:
@@ -434,6 +495,20 @@ public:
   }
 if (!lh.maybe_isnan () && !real_less (&lh.lower_bound (), &dconst0))
   r.clear_nan ();
+
+unsigned ulps
+  = targetm.libm_function_max_error (CFN_SQRT, TYPE_MODE (type), false);
+if (ulps == ~0U)
+  return true;
+REAL_VALUE_TYPE lb = lh.lower_bound ();
+REAL_VALUE_TYPE ub = lh.upper_boun

[PATCH 06/23] arm: [MVE intrinsics] factorize vabdq

2023-05-05 Thread Christophe Lyon via Gcc-patches

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_FP_M_BINARY): Add vabdq.
(MVE_FP_VABDQ_ONLY): New.
(mve_insn): Add vabd.
* config/arm/mve.md (mve_vabdq_f): Move into ...
(@mve_q_f): ... this.
(mve_vabdq_m_f): Remove.
---
 gcc/config/arm/iterators.md |  9 +++--
 gcc/config/arm/mve.md   | 25 +
 2 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index c53b42a86e9..3133642ea82 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -466,6 +466,7 @@ (define_int_iterator MVE_RSHIFT_N   [
 ])
 
 (define_int_iterator MVE_FP_M_BINARY   [
+VABDQ_M_F
 VADDQ_M_F
 VMULQ_M_F
 VSUBQ_M_F
@@ -490,6 +491,10 @@ (define_int_iterator MVE_FP_N_BINARY   [
 VSUBQ_N_F
 ])
 
+(define_int_iterator MVE_FP_VABDQ_ONLY [
+VABDQ_F
+])
+
 (define_int_iterator MVE_FP_CREATE_ONLY [
 VCREATEQ_F
 ])
@@ -501,8 +506,8 @@ (define_code_attr mve_addsubmul [
 ])
 
 (define_int_attr mve_insn [
-(VABDQ_M_S "vabd") (VABDQ_M_U "vabd")
-(VABDQ_S "vabd") (VABDQ_U "vabd")
+(VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F "vabd")
+(VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index fb1076aef73..c8cb4e430ac 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1451,17 +1451,17 @@ (define_insn "mve_vrshrq_n_"
 ])
 
 ;;
-;; [vabdq_f])
+;; [vabdq_f]
 ;;
-(define_insn "mve_vabdq_f"
+(define_insn "@mve_q_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
   (match_operand:MVE_0 2 "s_register_operand" "w")]
-VABDQ_F))
+MVE_FP_VABDQ_ONLY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vabd.f%# %q0, %q1, %q2"
+  ".f%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -5483,24 +5483,9 @@ (define_insn "mve_vrmlsldavhaxq_p_sv4si"
   "vpst\;vrmlsldavhaxt.s32\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
-;;
-;; [vabdq_m_f])
-;;
-(define_insn "mve_vabdq_m_f"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-  (match_operand:MVE_0 2 "s_register_operand" "w")
-  (match_operand:MVE_0 3 "s_register_operand" "w")
-  (match_operand: 4 "vpr_register_operand" 
"Up")]
-VABDQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vabdt.f%#  %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
 
 ;;
+;; [vabdq_m_f]
 ;; [vaddq_m_f]
 ;; [vsubq_m_f]
 ;; [vmulq_m_f]
-- 
2.34.1

[PATCH 02/23] arm: [MVE intrinsics] factorize vqrshlq vrshlq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vqrshlq, vrshlq so that they use the same pattern.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_RSHIFT_M_N, MVE_RSHIFT_N): New.
(mve_insn): Add vqrshl, vrshl.
* config/arm/mve.md (mve_vqrshlq_n_)
(mve_vrshlq_n_): Merge into ...
(@mve_q_n_): ... this.
(mve_vqrshlq_m_n_, mve_vrshlq_m_n_): Merge
into ...
(@mve_q_m_n_): ... this.
---
 gcc/config/arm/iterators.md | 14 +++
 gcc/config/arm/mve.md   | 49 -
 2 files changed, 24 insertions(+), 39 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 593be83e0be..e7622fe752a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -435,6 +435,16 @@ (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
 VORRQ_N_S VORRQ_N_U
 ])
 
+(define_int_iterator MVE_RSHIFT_M_N   [
+VQRSHLQ_M_N_S VQRSHLQ_M_N_U
+VRSHLQ_M_N_S VRSHLQ_M_N_U
+])
+
+(define_int_iterator MVE_RSHIFT_N   [
+VQRSHLQ_N_S VQRSHLQ_N_U
+VRSHLQ_N_S VRSHLQ_N_U
+])
+
 (define_int_iterator MVE_FP_M_BINARY   [
 VADDQ_M_F
 VMULQ_M_F
@@ -526,7 +536,9 @@ (define_int_attr mve_insn [
 (VQRDMULHQ_M_S "vqrdmulh")
 (VQRDMULHQ_N_S "vqrdmulh")
 (VQRDMULHQ_S "vqrdmulh")
+(VQRSHLQ_M_N_S "vqrshl") (VQRSHLQ_M_N_U "vqrshl")
 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
+(VQRSHLQ_N_S "vqrshl") (VQRSHLQ_N_U "vqrshl")
 (VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
 (VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
@@ -538,7 +550,9 @@ (define_int_attr mve_insn [
 (VRHADDQ_S "vrhadd") (VRHADDQ_U "vrhadd")
 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
 (VRMULHQ_S "vrmulh") (VRMULHQ_U "vrmulh")
+(VRSHLQ_M_N_S "vrshl") (VRSHLQ_M_N_U "vrshl")
 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
+(VRSHLQ_N_S "vrshl") (VRSHLQ_N_U "vrshl")
 (VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 6b88fdb8a7a..0d3343b6e29 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1373,17 +1373,18 @@ (define_expand "mve_vorrq_u"
 )
 
 ;;
-;; [vqrshlq_n_s, vqrshlq_n_u])
+;; [vqrshlq_n_s, vqrshlq_n_u]
+;; [vrshlq_n_u, vrshlq_n_s]
 ;;
-(define_insn "mve_vqrshlq_n_"
+(define_insn "@mve_q_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
   (match_operand:SI 2 "s_register_operand" "r")]
-VQRSHLQ_N))
+MVE_RSHIFT_N))
   ]
   "TARGET_HAVE_MVE"
-  "vqrshl.%#\t%q0, %2"
+  ".%#\t%q0, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1432,21 +1433,6 @@ (define_insn "mve_vqshluq_n_s"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vrshlq_n_u, vrshlq_n_s])
-;;
-(define_insn "mve_vrshlq_n_"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-  (match_operand:SI 2 "s_register_operand" "r")]
-VRSHLQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vrshl.%#\t%q0, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vrshrq_n_s, vrshrq_n_u])
 ;;
@@ -3098,18 +3084,19 @@ (define_insn "mve_vqrdmlsdhxq_s"
 ])
 
 ;;
-;; [vqrshlq_m_n_s, vqrshlq_m_n_u])
+;; [vqrshlq_m_n_s, vqrshlq_m_n_u]
+;; [vrshlq_m_n_s, vrshlq_m_n_u]
 ;;
-(define_insn "mve_vqrshlq_m_n_"
+(define_insn "@mve_q_m_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
   (match_operand:SI 2 "s_register_operand" "r")
   (match_operand: 3 "vpr_register_operand" 
"Up")]
-VQRSHLQ_M_N))
+MVE_RSHIFT_M_N))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vqrshlt.%#   %q0, %2"
+  "vpst\;t.%#\t%q0, %2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
@@ -3145,22 +3132,6 @@ (define_insn "mve_vrev64q_m_"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vrshlq_m_n_s, vrshlq_m_n_u])
-;;
-(define_insn "mve_vrshlq_m_n_"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-  (match_operand:SI 2 "s_register_operand" "r")
-  (match_operand: 3 "vpr_register_operand" 
"Up")]
-VRSHLQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vrshlt.%#\t%q0, %2"
-  [(set_attr "type"

[PATCH 01/23] arm: [MVE intrinsics] add binary_round_lshift shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_round_lshift shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_round_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_round_lshift): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 61 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 62 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 5e6681c784a..28a2d66ddd1 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -458,6 +458,67 @@ struct binary_orrq_def : public overloaded_base<0>
 };
 SHAPE (binary_orrq)
 
+/* _t vfoo[t0](_t, _t)
+   _t vfoo[_n_t0](_t, int32_t)
+
+   Shape for rounding shift left operations.
+
+   Example: vrshlq.
+   int8x16_t [__arm_]vrshlq[_n_s8](int8x16_t a, int32_t b)
+   int8x16_t [__arm_]vrshlq_m_n[_s8](int8x16_t a, int32_t b, mve_pred16_t p)
+   int8x16_t [__arm_]vrshlq[_s8](int8x16_t a, int8x16_t b)
+   int8x16_t [__arm_]vrshlq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t 
b, mve_pred16_t p)
+   int8x16_t [__arm_]vrshlq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)  
*/
+struct binary_round_lshift_def : public overloaded_base<0>
+{
+  bool
+  explicit_mode_suffix_p (enum predication_index pred, enum mode_suffix_index 
mode) const override
+  {
+return ((mode == MODE_n)
+   && (pred == PRED_m));
+  }
+
+  bool
+  skip_overload_p (enum predication_index pred, enum mode_suffix_index mode) 
const override
+  {
+switch (mode)
+  {
+  case MODE_none:
+   return false;
+
+   /* For MODE_n, share the overloaded instance with MODE_none, except for 
PRED_m.  */
+  case MODE_n:
+   return pred != PRED_m;
+
+  default:
+   gcc_unreachable ();
+  }
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,v0,vs0", group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v0,ss32", group, MODE_n, preserve_user_namespace, false, 
preds_m_or_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.finish_opt_n_resolution (i, 0, type, TYPE_signed);
+  }
+};
+SHAPE (binary_round_lshift)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 3305d12877a..cef081aa8ec 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,7 @@ namespace arm_mve
 extern const function_shape *const binary;
 extern const function_shape *const binary_opt_n;
 extern const function_shape *const binary_orrq;
+extern const function_shape *const binary_round_lshift;
 extern const function_shape *const create;
 extern const function_shape *const inherent;
 extern const function_shape *const unary_convert;
-- 
2.34.1

[PATCH 10/23] arm: [MVE intrinsics] add binary_lshift_r shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_lshift_r shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_lshift_r): New.
* config/arm/arm-mve-builtins-shapes.h (binary_lshift_r): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 41 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 42 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index e5093c3f29d..4ecb612ece5 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -576,6 +576,47 @@ struct binary_lshift_def : public overloaded_base<0>
 };
 SHAPE (binary_lshift)
 
+/* Used with the above form, but only for the MODE_r case which does
+   not always support the same set of predicates as MODE_none and
+   MODE_n.  For vqshlq they are the same, but for vshlq they are not.
+
+   _t vfoo_r[_t0](_t, int32_t)
+
+   i.e. the standard shape for shift operations that operate on
+   vector types.
+   Example: vshlq.
+   int8x16_t [__arm_]vshlq_r[_s8](int8x16_t a, int32_t b)
+   int8x16_t [__arm_]vshlq_m_r[_s8](int8x16_t a, int32_t b, mve_pred16_t p)  */
+struct binary_lshift_r_def : public overloaded_base<0>
+{
+  bool
+  explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) 
const override
+  {
+return true;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_r, preserve_user_namespace);
+build_all (b, "v0,v0,ss32", group, MODE_r, preserve_user_namespace, false, 
preds_m_or_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.finish_opt_n_resolution (i, 0, type, TYPE_signed);
+  }
+};
+SHAPE (binary_lshift_r)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index e472862ceef..25d9b60a670 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -36,6 +36,7 @@ namespace arm_mve
 
 extern const function_shape *const binary;
 extern const function_shape *const binary_lshift;
+extern const function_shape *const binary_lshift_r;
 extern const function_shape *const binary_opt_n;
 extern const function_shape *const binary_orrq;
 extern const function_shape *const binary_round_lshift;
-- 
2.34.1

[PATCH 04/23] arm: [MVE intrinsics] factorize vqshlq vshlq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vqshlq and vshlq so that they use the same pattern.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_SHIFT_M_R, MVE_SHIFT_M_N)
(MVE_SHIFT_N, MVE_SHIFT_R): New.
(mve_insn): Add vqshl, vshl.
* config/arm/mve.md (mve_vqshlq_n_)
(mve_vshlq_n_): Merge into ...
(@mve_q_n_): ... this.
(mve_vqshlq_r_, mve_vshlq_r_): Merge into
...
(@mve_q_r_): ... this.
(mve_vqshlq_m_r_, mve_vshlq_m_r_): Merge
into ...
(@mve_q_m_r_): ... this.
(mve_vqshlq_m_n_, mve_vshlq_m_n_): Merge
into ...
(@mve_q_m_n_): ... this.
* config/arm/vec-common.md (mve_vshlq_): Transform
into ...
(@mve_q_): ... this.
---
 gcc/config/arm/iterators.md  | 29 +++
 gcc/config/arm/mve.md| 99 
 gcc/config/arm/vec-common.md |  4 +-
 3 files changed, 51 insertions(+), 81 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e7622fe752a..c53b42a86e9 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -435,6 +435,26 @@ (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
 VORRQ_N_S VORRQ_N_U
 ])
 
+(define_int_iterator MVE_SHIFT_M_R   [
+VQSHLQ_M_R_S VQSHLQ_M_R_U
+VSHLQ_M_R_S VSHLQ_M_R_U
+])
+
+(define_int_iterator MVE_SHIFT_M_N   [
+VQSHLQ_M_N_S VQSHLQ_M_N_U
+VSHLQ_M_N_S VSHLQ_M_N_U
+])
+
+(define_int_iterator MVE_SHIFT_N   [
+VQSHLQ_N_S VQSHLQ_N_U
+VSHLQ_N_S VSHLQ_N_U
+])
+
+(define_int_iterator MVE_SHIFT_R   [
+VQSHLQ_R_S VQSHLQ_R_U
+VSHLQ_R_S VSHLQ_R_U
+])
+
 (define_int_iterator MVE_RSHIFT_M_N   [
 VQRSHLQ_M_N_S VQRSHLQ_M_N_U
 VRSHLQ_M_N_S VRSHLQ_M_N_U
@@ -540,7 +560,11 @@ (define_int_attr mve_insn [
 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
 (VQRSHLQ_N_S "vqrshl") (VQRSHLQ_N_U "vqrshl")
 (VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
+(VQSHLQ_M_N_S "vqshl") (VQSHLQ_M_N_U "vqshl")
+(VQSHLQ_M_R_S "vqshl") (VQSHLQ_M_R_U "vqshl")
 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
+(VQSHLQ_N_S "vqshl") (VQSHLQ_N_U "vqshl")
+(VQSHLQ_R_S "vqshl") (VQSHLQ_R_U "vqshl")
 (VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
@@ -554,7 +578,12 @@ (define_int_attr mve_insn [
 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
 (VRSHLQ_N_S "vrshl") (VRSHLQ_N_U "vrshl")
 (VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
+(VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
+(VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
+(VSHLQ_N_S "vshl") (VSHLQ_N_U "vshl")
+(VSHLQ_R_S "vshl") (VSHLQ_R_U "vshl")
+(VSHLQ_S "vshl") (VSHLQ_U "vshl")
 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F "vsub")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 0d3343b6e29..fb1076aef73 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1389,32 +1389,34 @@ (define_insn "@mve_q_n_"
 ])
 
 ;;
-;; [vqshlq_n_s, vqshlq_n_u])
+;; [vqshlq_n_s, vqshlq_n_u]
+;; [vshlq_n_u, vshlq_n_s]
 ;;
-(define_insn "mve_vqshlq_n_"
+(define_insn "@mve_q_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
   (match_operand:SI 2 "immediate_operand" "i")]
-VQSHLQ_N))
+MVE_SHIFT_N))
   ]
   "TARGET_HAVE_MVE"
-  "vqshl.%#\t%q0, %q1, %2"
+  ".%#\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
 ;;
-;; [vqshlq_r_u, vqshlq_r_s])
+;; [vqshlq_r_u, vqshlq_r_s]
+;; [vshlq_r_s, vshlq_r_u]
 ;;
-(define_insn "mve_vqshlq_r_"
+(define_insn "@mve_q_r_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
   (match_operand:SI 2 "s_register_operand" "r")]
-VQSHLQ_R))
+MVE_SHIFT_R))
   ]
   "TARGET_HAVE_MVE"
-  "vqshl.%#\t%q0, %2"
+  ".%#\t%q0, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1448,36 +1450,6 @@ (define_insn "mve_vrshrq_n_"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vshlq_n_u, vshlq_n_s])
-;;
-(define_insn "mve_vshlq_n_"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE

[PATCH 11/23] arm: [MVE intrinsics] add unspec_mve_function_exact_insn_vshl

2023-05-05 Thread Christophe Lyon via Gcc-patches

Introduce a function that will be used to build vshl intrinsics. They
are special because they have to handle MODE_r.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_vshl): New.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 150 
 1 file changed, 150 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index 5abf913d182..533fd1159c6 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -376,6 +376,156 @@ public:
   }
 };
 
+/* Map the function directly to CODE (UNSPEC, M) for vshl-like
+   builtins. The difference with unspec_mve_function_exact_insn is
+   that this function handles MODE_r and the related unspecs..  */
+class unspec_mve_function_exact_insn_vshl : public function_base
+{
+public:
+  CONSTEXPR unspec_mve_function_exact_insn_vshl (int unspec_for_sint,
+int unspec_for_uint,
+int unspec_for_n_sint,
+int unspec_for_n_uint,
+int unspec_for_m_sint,
+int unspec_for_m_uint,
+int unspec_for_m_n_sint,
+int unspec_for_m_n_uint,
+int unspec_for_m_r_sint,
+int unspec_for_m_r_uint,
+int unspec_for_r_sint,
+int unspec_for_r_uint)
+: m_unspec_for_sint (unspec_for_sint),
+  m_unspec_for_uint (unspec_for_uint),
+  m_unspec_for_n_sint (unspec_for_n_sint),
+  m_unspec_for_n_uint (unspec_for_n_uint),
+  m_unspec_for_m_sint (unspec_for_m_sint),
+  m_unspec_for_m_uint (unspec_for_m_uint),
+  m_unspec_for_m_n_sint (unspec_for_m_n_sint),
+  m_unspec_for_m_n_uint (unspec_for_m_n_uint),
+  m_unspec_for_m_r_sint (unspec_for_m_r_sint),
+  m_unspec_for_m_r_uint (unspec_for_m_r_uint),
+  m_unspec_for_r_sint (unspec_for_r_sint),
+  m_unspec_for_r_uint (unspec_for_r_uint)
+  {}
+
+  /* The unspec code associated with signed-integer, unsigned-integer
+ and floating-point operations respectively.  It covers the cases
+ with the _n suffix, and/or the _m predicate.  */
+  int m_unspec_for_sint;
+  int m_unspec_for_uint;
+  int m_unspec_for_n_sint;
+  int m_unspec_for_n_uint;
+  int m_unspec_for_m_sint;
+  int m_unspec_for_m_uint;
+  int m_unspec_for_m_n_sint;
+  int m_unspec_for_m_n_uint;
+  int m_unspec_for_m_r_sint;
+  int m_unspec_for_m_r_uint;
+  int m_unspec_for_r_sint;
+  int m_unspec_for_r_uint;
+
+  rtx
+  expand (function_expander &e) const override
+  {
+insn_code code;
+switch (e.pred)
+  {
+  case PRED_none:
+   switch (e.mode_suffix_id)
+ {
+ case MODE_none:
+   /* No predicate, no suffix.  */
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q (m_unspec_for_uint, m_unspec_for_uint, 
e.vector_mode (0));
+   else
+ code = code_for_mve_q (m_unspec_for_sint, m_unspec_for_sint, 
e.vector_mode (0));
+   break;
+
+ case MODE_n:
+   /* No predicate, _n suffix.  */
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q_n (m_unspec_for_n_uint, 
m_unspec_for_n_uint, e.vector_mode (0));
+   else
+ code = code_for_mve_q_n (m_unspec_for_n_sint, 
m_unspec_for_n_sint, e.vector_mode (0));
+   break;
+
+ case MODE_r:
+   /* No predicate, _r suffix.  */
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q_r (m_unspec_for_r_uint, 
m_unspec_for_r_uint, e.vector_mode (0));
+   else
+ code = code_for_mve_q_r (m_unspec_for_r_sint, 
m_unspec_for_r_sint, e.vector_mode (0));
+   break;
+
+ default:
+   gcc_unreachable ();
+ }
+   return e.use_exact_insn (code);
+
+  case PRED_m:
+   switch (e.mode_suffix_id)
+ {
+ case MODE_none:
+   /* No suffix, "m" predicate.  */
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q_m (m_unspec_for_m_uint, 
m_unspec_for_m_uint, e.vector_mode (0));
+   else
+ code = code_for_mve_q_m (m_unspec_for_m_sint, 
m_unspec_for_m_sint, e.vector_mode (0));
+   break;
+
+ case MODE_n:
+   /* _n suffix, "m" predicate.  */
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q_m_n (m_unspec_for_m_n_uint, 
m_unspec_for_m_n_uint, e.vector_mode (0));
+   else
+ code = code_for_mve_q_m_n (m_unspec_fo

[PATCH 09/23] arm: [MVE intrinsics] add support for MODE_r

2023-05-05 Thread Christophe Lyon via Gcc-patches

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins.cc (has_inactive_argument)
(finish_opt_n_resolution): Handle MODE_r.
* config/arm/arm-mve-builtins.def (r): New mode.
---
 gcc/config/arm/arm-mve-builtins.cc  | 8 ++--
 gcc/config/arm/arm-mve-builtins.def | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 91b3ae71f94..c25b1be9903 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -669,7 +669,8 @@ function_instance::has_inactive_argument () const
   if (pred != PRED_m)
 return false;
 
-  if ((base == functions::vorrq && mode_suffix_id == MODE_n)
+  if (mode_suffix_id == MODE_r
+  || (base == functions::vorrq && mode_suffix_id == MODE_n)
   || (base == functions::vqrshlq && mode_suffix_id == MODE_n)
   || (base == functions::vrshlq && mode_suffix_id == MODE_n))
 return false;
@@ -1522,7 +1523,10 @@ finish_opt_n_resolution (unsigned int argno, unsigned 
int first_argno,
 {
   if (inferred_type == NUM_TYPE_SUFFIXES)
 inferred_type = first_type;
-  tree scalar_form = lookup_form (MODE_n, inferred_type);
+  mode_suffix_index scalar_mode = MODE_n;
+  if (mode_suffix_id == MODE_r)
+scalar_mode = MODE_r;
+  tree scalar_form = lookup_form (scalar_mode, inferred_type);
 
   /* Allow the final argument to be scalar, if an _n form exists.  */
   if (scalar_argument_p (argno))
diff --git a/gcc/config/arm/arm-mve-builtins.def 
b/gcc/config/arm/arm-mve-builtins.def
index 49d07364fa2..e3f37876210 100644
--- a/gcc/config/arm/arm-mve-builtins.def
+++ b/gcc/config/arm/arm-mve-builtins.def
@@ -35,6 +35,7 @@
 
 DEF_MVE_MODE (n, none, none, none)
 DEF_MVE_MODE (offset, none, none, bytes)
+DEF_MVE_MODE (r, none, none, none)
 
 #define REQUIRES_FLOAT false
 DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
-- 
2.34.1

[PATCH 08/23] arm: [MVE intrinsics] add binary_lshift shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_lshift shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_lshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_lshift): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 57 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 58 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 28a2d66ddd1..e5093c3f29d 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -519,6 +519,63 @@ struct binary_round_lshift_def : public overloaded_base<0>
 };
 SHAPE (binary_round_lshift)
 
+/* _t vfoo[_t0](_t, _t)
+   _t vfoo_n[_t0](_t, const int)
+
+   i.e. the standard shape for left shift operations that operate on
+   vector types.
+
+   For the MODE_n versions, check that 'imm' is in the [0..#bits-1] range.
+
+   Example: vshlq.
+   int8x16_t [__arm_]vshlq[_s8](int8x16_t a, int8x16_t b)
+   int8x16_t [__arm_]vshlq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t 
b, mve_pred16_t p)
+   int8x16_t [__arm_]vshlq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
+   int8x16_t [__arm_]vshlq_n[_s8](int8x16_t a, const int imm)
+   int8x16_t [__arm_]vshlq_m_n[_s8](int8x16_t inactive, int8x16_t a, const int 
imm, mve_pred16_t p)
+   int8x16_t [__arm_]vshlq_x_n[_s8](int8x16_t a, const int imm, mve_pred16_t 
p)  */
+struct binary_lshift_def : public overloaded_base<0>
+{
+  bool
+  explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) 
const override
+  {
+return true;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,v0,vs0", group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v0,ss32", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.finish_opt_n_resolution (i, 0, type, TYPE_signed);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+if (c.mode_suffix_id != MODE_n)
+  return true;
+
+unsigned int bits = c.type_suffix (0).element_bits;
+return c.require_immediate_range (1, 0, bits - 1);
+  }
+};
+SHAPE (binary_lshift)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index cef081aa8ec..e472862ceef 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -35,6 +35,7 @@ namespace arm_mve
   {
 
 extern const function_shape *const binary;
+extern const function_shape *const binary_lshift;
 extern const function_shape *const binary_opt_n;
 extern const function_shape *const binary_orrq;
 extern const function_shape *const binary_round_lshift;
-- 
2.34.1

[PATCH 07/23] arm: [MVE intrinsics] rework vabdq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vabdq using the new MVE builtins framework.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N): New.
(vabdq): New.
* config/arm/arm-mve-builtins-base.def (vabdq): New.
* config/arm/arm-mve-builtins-base.h (vabdq): New.
* config/arm/arm_mve.h (vabdq): Remove.
(vabdq_m): Remove.
(vabdq_x): Remove.
(vabdq_u8): Remove.
(vabdq_s8): Remove.
(vabdq_u16): Remove.
(vabdq_s16): Remove.
(vabdq_u32): Remove.
(vabdq_s32): Remove.
(vabdq_f16): Remove.
(vabdq_f32): Remove.
(vabdq_m_s8): Remove.
(vabdq_m_s32): Remove.
(vabdq_m_s16): Remove.
(vabdq_m_u8): Remove.
(vabdq_m_u32): Remove.
(vabdq_m_u16): Remove.
(vabdq_m_f32): Remove.
(vabdq_m_f16): Remove.
(vabdq_x_s8): Remove.
(vabdq_x_s16): Remove.
(vabdq_x_s32): Remove.
(vabdq_x_u8): Remove.
(vabdq_x_u16): Remove.
(vabdq_x_u32): Remove.
(vabdq_x_f16): Remove.
(vabdq_x_f32): Remove.
(__arm_vabdq_u8): Remove.
(__arm_vabdq_s8): Remove.
(__arm_vabdq_u16): Remove.
(__arm_vabdq_s16): Remove.
(__arm_vabdq_u32): Remove.
(__arm_vabdq_s32): Remove.
(__arm_vabdq_m_s8): Remove.
(__arm_vabdq_m_s32): Remove.
(__arm_vabdq_m_s16): Remove.
(__arm_vabdq_m_u8): Remove.
(__arm_vabdq_m_u32): Remove.
(__arm_vabdq_m_u16): Remove.
(__arm_vabdq_x_s8): Remove.
(__arm_vabdq_x_s16): Remove.
(__arm_vabdq_x_s32): Remove.
(__arm_vabdq_x_u8): Remove.
(__arm_vabdq_x_u16): Remove.
(__arm_vabdq_x_u32): Remove.
(__arm_vabdq_f16): Remove.
(__arm_vabdq_f32): Remove.
(__arm_vabdq_m_f32): Remove.
(__arm_vabdq_m_f16): Remove.
(__arm_vabdq_x_f16): Remove.
(__arm_vabdq_x_f32): Remove.
(__arm_vabdq): Remove.
(__arm_vabdq_m): Remove.
(__arm_vabdq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm_mve.h | 431 ---
 4 files changed, 13 insertions(+), 431 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 8c125657c67..a74119db917 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -146,6 +146,16 @@ namespace arm_mve {
 UNSPEC##_M_S, -1, -1,  \
 UNSPEC##_M_N_S, -1, -1))
 
+  /* Helper for builtins with only unspec codes, _m predicated
+ overrides, but no _n version.  */
+#define FUNCTION_WITHOUT_N(NAME, UNSPEC) FUNCTION  \
+  (NAME, unspec_mve_function_exact_insn,   \
+   (UNSPEC##_S, UNSPEC##_U, UNSPEC##_F,
\
+-1, -1, -1,
\
+UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,  \
+-1, -1, -1))
+
+FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 5b9966341ce..9230837fd43 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -18,6 +18,7 @@
.  */
 
 #define REQUIRES_FLOAT false
+DEF_MVE_FUNCTION (vabdq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
@@ -41,6 +42,7 @@ DEF_MVE_FUNCTION (vuninitializedq, inherent, 
all_integer_with_64, none)
 #undef REQUIRES_FLOAT
 
 #define REQUIRES_FLOAT true
+DEF_MVE_FUNCTION (vabdq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index eeb747d52ad..d9d45d1925a 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -23,6 +23,7 @@
 namespace arm_mve {
 namespace functions {
 
+extern const function_base *const vabdq;
 extern const function_base *const vaddq;
 extern const function_base *const vandq;
 extern const function_base *const vcreateq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 44b383dbe08..175d9955c33 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -77

[PATCH 05/23] arm: [MVE intrinsics] rework vqrdmulhq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vqrdmulhq using the new MVE builtins framework.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrdmulhq): New.
* config/arm/arm-mve-builtins-base.def (vqrdmulhq): New.
* config/arm/arm-mve-builtins-base.h (vqrdmulhq): New.
* config/arm/arm_mve.h (vqrdmulhq): Remove.
(vqrdmulhq_m): Remove.
(vqrdmulhq_s8): Remove.
(vqrdmulhq_n_s8): Remove.
(vqrdmulhq_s16): Remove.
(vqrdmulhq_n_s16): Remove.
(vqrdmulhq_s32): Remove.
(vqrdmulhq_n_s32): Remove.
(vqrdmulhq_m_n_s8): Remove.
(vqrdmulhq_m_n_s32): Remove.
(vqrdmulhq_m_n_s16): Remove.
(vqrdmulhq_m_s8): Remove.
(vqrdmulhq_m_s32): Remove.
(vqrdmulhq_m_s16): Remove.
(__arm_vqrdmulhq_s8): Remove.
(__arm_vqrdmulhq_n_s8): Remove.
(__arm_vqrdmulhq_s16): Remove.
(__arm_vqrdmulhq_n_s16): Remove.
(__arm_vqrdmulhq_s32): Remove.
(__arm_vqrdmulhq_n_s32): Remove.
(__arm_vqrdmulhq_m_n_s8): Remove.
(__arm_vqrdmulhq_m_n_s32): Remove.
(__arm_vqrdmulhq_m_n_s16): Remove.
(__arm_vqrdmulhq_m_s8): Remove.
(__arm_vqrdmulhq_m_s32): Remove.
(__arm_vqrdmulhq_m_s16): Remove.
(__arm_vqrdmulhq): Remove.
(__arm_vqrdmulhq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   1 +
 gcc/config/arm/arm-mve-builtins-base.def |   1 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm_mve.h | 213 ---
 4 files changed, 3 insertions(+), 213 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index f5e48519b19..8c125657c67 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -158,6 +158,7 @@ FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
 FUNCTION_WITH_M_N_NO_F (vqaddq, VQADDQ)
 FUNCTION_WITH_M_N_NO_U_F (vqdmulhq, VQDMULHQ)
 FUNCTION_WITH_M_N_NO_F (vqrshlq, VQRSHLQ)
+FUNCTION_WITH_M_N_NO_U_F (vqrdmulhq, VQRDMULHQ)
 FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
 FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index e6dc2b00aaa..5b9966341ce 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -29,6 +29,7 @@ DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, 
mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vqaddq, binary_opt_n, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vqdmulhq, binary_opt_n, all_signed, m_or_none)
+DEF_MVE_FUNCTION (vqrdmulhq, binary_opt_n, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vqrshlq, binary_round_lshift, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vqsubq, binary_opt_n, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 31ba3fece82..eeb747d52ad 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -34,6 +34,7 @@ extern const function_base *const vmulq;
 extern const function_base *const vorrq;
 extern const function_base *const vqaddq;
 extern const function_base *const vqdmulhq;
+extern const function_base *const vqrdmulhq;
 extern const function_base *const vqrshlq;
 extern const function_base *const vqsubq;
 extern const function_base *const vreinterpretq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 636945d6ef0..44b383dbe08 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -94,7 +94,6 @@
 #define vcmpgtq(__a, __b) __arm_vcmpgtq(__a, __b)
 #define vcmpgeq(__a, __b) __arm_vcmpgeq(__a, __b)
 #define vqshluq(__a, __imm) __arm_vqshluq(__a, __imm)
-#define vqrdmulhq(__a, __b) __arm_vqrdmulhq(__a, __b)
 #define vmlsdavxq(__a, __b) __arm_vmlsdavxq(__a, __b)
 #define vmlsdavq(__a, __b) __arm_vmlsdavq(__a, __b)
 #define vmladavxq(__a, __b) __arm_vmladavxq(__a, __b)
@@ -249,7 +248,6 @@
 #define vqrdmlashq_m(__a, __b, __c, __p) __arm_vqrdmlashq_m(__a, __b, __c, __p)
 #define vqrdmlsdhq_m(__inactive, __a, __b, __p) __arm_vqrdmlsdhq_m(__inactive, 
__a, __b, __p)
 #define vqrdmlsdhxq_m(__inactive, __a, __b, __p) 
__arm_vqrdmlsdhxq_m(__inactive, __a, __b, __p)
-#define vqrdmulhq_m(__inactive, __a, __b, __p) __arm_vqrdmulhq_m(__inactive, 
__a, __b, __p)
 #define vqshlq_m_n(__inactive, __a, __imm, __p) __arm_vqshlq_m_n(__inactive, 
__a, __imm, __p)
 #define vqshlq_m(__inactive, __a, __b, __p) __arm_vqshlq_m(__inactive, __a, 
__b, __p)
 #define vrshrq_m(__inactive, __a, __imm, __p) __arm_vrshrq_m(__inactive, __a, 
__imm, __p)
@@ -682,8 +680,6 @@
 #define vshlq_r_s8(__a, __b) __arm_vshlq_r_s8(__a, __b)
 #define vqshlq_s8(__a, __b) __arm_vqshlq_s8(__a, __b)
 #define vqshlq_r_s8(_

[PATCH 21/23] arm: [MVE intrinsics] add binary_rshift shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_rshift shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_rshift): New.
* config/arm/arm-mve-builtins-shapes.h (binary_rshift): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 36 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 37 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index e3bf586565c..7078f7d7220 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -365,6 +365,42 @@ struct binary_def : public overloaded_base<0>
 };
 SHAPE (binary)
 
+/* _t vfoo[_n_t0](_t, const int)
+
+   Shape for vector shift right operations that take a vector first
+   argument and an integer, and produce a vector.
+
+   Check that 'imm' is in the [1..#bits] range.
+
+   Example: vrshrq.
+   int8x16_t [__arm_]vrshrq[_n_s8](int8x16_t a, const int imm)
+   int8x16_t [__arm_]vrshrq_m[_n_s8](int8x16_t inactive, int8x16_t a, const 
int imm, mve_pred16_t p)
+   int8x16_t [__arm_]vrshrq_x[_n_s8](int8x16_t a, const int imm, mve_pred16_t 
p)  */
+struct binary_rshift_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,v0,ss32", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+return r.resolve_uniform (1, 1);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+unsigned int bits = c.type_suffix (0).element_bits;
+return c.require_immediate_range (1, 1, bits);
+  }
+};
+SHAPE (binary_rshift)
+
 /* _t vfoo[_t0](_t, _t)
_t vfoo[_n_t0](_t, _t)
 
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index ca1c1017e8e..09e00b69e63 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -40,6 +40,7 @@ namespace arm_mve
 extern const function_shape *const binary_opt_n;
 extern const function_shape *const binary_orrq;
 extern const function_shape *const binary_round_lshift;
+extern const function_shape *const binary_rshift;
 extern const function_shape *const binary_rshift_narrow;
 extern const function_shape *const binary_rshift_narrow_unsigned;
 extern const function_shape *const create;
-- 
2.34.1

[PATCH 18/23] arm: [MVE intrinsics] add binary_rshift_narrow_unsigned shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_rshift_narrow_unsigned shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc
(binary_rshift_narrow_unsigned): New.
* config/arm/arm-mve-builtins-shapes.h
(binary_rshift_narrow_unsigned): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 48 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 49 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 88934e1ca15..e3bf586565c 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -664,6 +664,54 @@ struct binary_rshift_narrow_def : public overloaded_base<0>
 };
 SHAPE (binary_rshift_narrow)
 
+/* _t vfoo[_n_t0](_t, _t, const int)
+
+   Vector saturating rounding shift right and narrow.
+   Check that 'imm' is in the [1..#bits/2] range.
+
+   Example: vqshrunbq.
+   uint8x16_t [__arm_]vqshrunbq[_n_s16](uint8x16_t a, int16x8_t b, const int 
imm)
+   uint8x16_t [__arm_]vqshrunbq_m[_n_s16](uint8x16_t a, int16x8_t b, const int 
imm, mve_pred16_t p)  */
+struct binary_rshift_narrow_unsigned_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "vhu0,vhu0,v0,ss32", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (3, i, nargs)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
+   || !r.require_integer_immediate (i))
+  return error_mark_node;
+
+type_suffix_index narrow_suffix
+  = find_type_suffix (TYPE_unsigned,
+ type_suffixes[type].element_bits / 2);
+
+if (!r.require_matching_vector_type (0, narrow_suffix))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+unsigned int bits = c.type_suffix (0).element_bits;
+return c.require_immediate_range (2, 1, bits / 2);
+  }
+
+};
+SHAPE (binary_rshift_narrow_unsigned)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index d72686d187b..ca1c1017e8e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -41,6 +41,7 @@ namespace arm_mve
 extern const function_shape *const binary_orrq;
 extern const function_shape *const binary_round_lshift;
 extern const function_shape *const binary_rshift_narrow;
+extern const function_shape *const binary_rshift_narrow_unsigned;
 extern const function_shape *const create;
 extern const function_shape *const inherent;
 extern const function_shape *const unary_convert;
-- 
2.34.1

[PATCH 22/23] arm: [MVE intrinsics] factorize vsrhrq vrshrq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vsrhrq vrshrq so that they use the same pattern.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_VSHRQ_M_N, MVE_VSHRQ_N): New.
(mve_insn): Add vrshr, vshr.
* config/arm/mve.md (mve_vshrq_n_)
(mve_vrshrq_n_): Merge into ...
(@mve_q_n_): ... this.
(mve_vrshrq_m_n_, mve_vshrq_m_n_): Merge
into ...
(@mve_q_m_n_): ... this.
---
 gcc/config/arm/iterators.md | 14 +++
 gcc/config/arm/mve.md   | 46 +++--
 2 files changed, 22 insertions(+), 38 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 583206dac9e..53873704174 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -408,6 +408,16 @@ (define_int_iterator MVE_INT_N_BINARY   [
 VSUBQ_N_S VSUBQ_N_U
 ])
 
+(define_int_iterator MVE_VSHRQ_M_N [
+VRSHRQ_M_N_S VRSHRQ_M_N_U
+VSHRQ_M_N_S VSHRQ_M_N_U
+])
+
+(define_int_iterator MVE_VSHRQ_N [
+VRSHRQ_N_S VRSHRQ_N_U
+VSHRQ_N_S VSHRQ_N_U
+])
+
 (define_int_iterator MVE_INT_SU_N_BINARY   [
 VHADDQ_N_S VHADDQ_N_U
 VHSUBQ_N_S VHSUBQ_N_U
@@ -636,6 +646,8 @@ (define_int_attr mve_insn [
 (VRSHRNBQ_N_S "vrshrnb") (VRSHRNBQ_N_U "vrshrnb")
 (VRSHRNTQ_M_N_S "vrshrnt") (VRSHRNTQ_M_N_U "vrshrnt")
 (VRSHRNTQ_N_S "vrshrnt") (VRSHRNTQ_N_U "vrshrnt")
+(VRSHRQ_M_N_S "vrshr") (VRSHRQ_M_N_U "vrshr")
+(VRSHRQ_N_S "vrshr") (VRSHRQ_N_U "vrshr")
 (VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
 (VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
@@ -646,6 +658,8 @@ (define_int_attr mve_insn [
 (VSHRNBQ_N_S "vshrnb") (VSHRNBQ_N_U "vshrnb")
 (VSHRNTQ_M_N_S "vshrnt") (VSHRNTQ_M_N_U "vshrnt")
 (VSHRNTQ_N_S "vshrnt") (VSHRNTQ_N_U "vshrnt")
+(VSHRQ_M_N_S "vshr") (VSHRQ_M_N_U "vshr")
+(VSHRQ_N_S "vshr") (VSHRQ_N_U "vshr")
 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F "vsub")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 20ce7ecb3d6..b5c89fd4105 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -728,18 +728,19 @@ (define_insn "@mve_q_"
(set_attr "length""8")])
 
 ;;
-;; [vshrq_n_s, vshrq_n_u])
+;; [vrshrq_n_s, vrshrq_n_u]
+;; [vshrq_n_s, vshrq_n_u]
 ;;
 ;; Version that takes an immediate as operand 2.
-(define_insn "mve_vshrq_n_"
+(define_insn "@mve_q_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
   (match_operand:SI 2 "" "")]
-VSHRQ_N))
+MVE_VSHRQ_N))
   ]
   "TARGET_HAVE_MVE"
-  "vshr.\t%q0, %q1, %2"
+  ".\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1401,21 +1402,6 @@ (define_insn "mve_vqshluq_n_s"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vrshrq_n_s, vrshrq_n_u])
-;;
-(define_insn "mve_vrshrq_n_"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-  (match_operand:SI 2 "" "")]
-VRSHRQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vrshr.%#\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vabdq_f]
 ;;
@@ -4661,35 +4647,19 @@ (define_insn "@mve_q_m_n_"
 
 ;;
 ;; [vrshrq_m_n_s, vrshrq_m_n_u])
-;;
-(define_insn "mve_vrshrq_m_n_"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-  (match_operand:MVE_2 2 "s_register_operand" "w")
-  (match_operand:SI 3 "" "")
-  (match_operand: 4 "vpr_register_operand" 
"Up")]
-VRSHRQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vrshrt.%#\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
 ;; [vshrq_m_n_s, vshrq_m_n_u])
 ;;
-(define_insn "mve_vshrq_m_n_"
+(define_insn "@mve_q_m_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
   (match_operand:MVE_2 2 "s_register_operand" "w")
   (match_operand:SI 3 "" "")
   (match_operand: 4 "vpr_register_operand" 
"Up")]
-VSHRQ_M_N))
+MVE_VSHRQ_M_N))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vshrt.%#\t%q0, %q2, %3"
+  "vpst\;t.%#\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-- 
2.34.1

[PATCH 03/23] arm: [MVE intrinsics] rework vrshlq vqrshlq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vrshlq, vqrshlq using the new MVE builtins framework.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/arm-mve-builtins-base.cc (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.def (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins-base.h (vqrshlq, vrshlq): New.
* config/arm/arm-mve-builtins.cc (has_inactive_argument): Handle
vqrshlq, vrshlq.
* config/arm/arm_mve.h (vrshlq): Remove.
(vrshlq_m_n): Remove.
(vrshlq_m): Remove.
(vrshlq_x): Remove.
(vrshlq_u8): Remove.
(vrshlq_n_u8): Remove.
(vrshlq_s8): Remove.
(vrshlq_n_s8): Remove.
(vrshlq_u16): Remove.
(vrshlq_n_u16): Remove.
(vrshlq_s16): Remove.
(vrshlq_n_s16): Remove.
(vrshlq_u32): Remove.
(vrshlq_n_u32): Remove.
(vrshlq_s32): Remove.
(vrshlq_n_s32): Remove.
(vrshlq_m_n_u8): Remove.
(vrshlq_m_n_s8): Remove.
(vrshlq_m_n_u16): Remove.
(vrshlq_m_n_s16): Remove.
(vrshlq_m_n_u32): Remove.
(vrshlq_m_n_s32): Remove.
(vrshlq_m_s8): Remove.
(vrshlq_m_s32): Remove.
(vrshlq_m_s16): Remove.
(vrshlq_m_u8): Remove.
(vrshlq_m_u32): Remove.
(vrshlq_m_u16): Remove.
(vrshlq_x_s8): Remove.
(vrshlq_x_s16): Remove.
(vrshlq_x_s32): Remove.
(vrshlq_x_u8): Remove.
(vrshlq_x_u16): Remove.
(vrshlq_x_u32): Remove.
(__arm_vrshlq_u8): Remove.
(__arm_vrshlq_n_u8): Remove.
(__arm_vrshlq_s8): Remove.
(__arm_vrshlq_n_s8): Remove.
(__arm_vrshlq_u16): Remove.
(__arm_vrshlq_n_u16): Remove.
(__arm_vrshlq_s16): Remove.
(__arm_vrshlq_n_s16): Remove.
(__arm_vrshlq_u32): Remove.
(__arm_vrshlq_n_u32): Remove.
(__arm_vrshlq_s32): Remove.
(__arm_vrshlq_n_s32): Remove.
(__arm_vrshlq_m_n_u8): Remove.
(__arm_vrshlq_m_n_s8): Remove.
(__arm_vrshlq_m_n_u16): Remove.
(__arm_vrshlq_m_n_s16): Remove.
(__arm_vrshlq_m_n_u32): Remove.
(__arm_vrshlq_m_n_s32): Remove.
(__arm_vrshlq_m_s8): Remove.
(__arm_vrshlq_m_s32): Remove.
(__arm_vrshlq_m_s16): Remove.
(__arm_vrshlq_m_u8): Remove.
(__arm_vrshlq_m_u32): Remove.
(__arm_vrshlq_m_u16): Remove.
(__arm_vrshlq_x_s8): Remove.
(__arm_vrshlq_x_s16): Remove.
(__arm_vrshlq_x_s32): Remove.
(__arm_vrshlq_x_u8): Remove.
(__arm_vrshlq_x_u16): Remove.
(__arm_vrshlq_x_u32): Remove.
(__arm_vrshlq): Remove.
(__arm_vrshlq_m_n): Remove.
(__arm_vrshlq_m): Remove.
(__arm_vrshlq_x): Remove.
(vqrshlq): Remove.
(vqrshlq_m_n): Remove.
(vqrshlq_m): Remove.
(vqrshlq_u8): Remove.
(vqrshlq_n_u8): Remove.
(vqrshlq_s8): Remove.
(vqrshlq_n_s8): Remove.
(vqrshlq_u16): Remove.
(vqrshlq_n_u16): Remove.
(vqrshlq_s16): Remove.
(vqrshlq_n_s16): Remove.
(vqrshlq_u32): Remove.
(vqrshlq_n_u32): Remove.
(vqrshlq_s32): Remove.
(vqrshlq_n_s32): Remove.
(vqrshlq_m_n_u8): Remove.
(vqrshlq_m_n_s8): Remove.
(vqrshlq_m_n_u16): Remove.
(vqrshlq_m_n_s16): Remove.
(vqrshlq_m_n_u32): Remove.
(vqrshlq_m_n_s32): Remove.
(vqrshlq_m_s8): Remove.
(vqrshlq_m_s32): Remove.
(vqrshlq_m_s16): Remove.
(vqrshlq_m_u8): Remove.
(vqrshlq_m_u32): Remove.
(vqrshlq_m_u16): Remove.
(__arm_vqrshlq_u8): Remove.
(__arm_vqrshlq_n_u8): Remove.
(__arm_vqrshlq_s8): Remove.
(__arm_vqrshlq_n_s8): Remove.
(__arm_vqrshlq_u16): Remove.
(__arm_vqrshlq_n_u16): Remove.
(__arm_vqrshlq_s16): Remove.
(__arm_vqrshlq_n_s16): Remove.
(__arm_vqrshlq_u32): Remove.
(__arm_vqrshlq_n_u32): Remove.
(__arm_vqrshlq_s32): Remove.
(__arm_vqrshlq_n_s32): Remove.
(__arm_vqrshlq_m_n_u8): Remove.
(__arm_vqrshlq_m_n_s8): Remove.
(__arm_vqrshlq_m_n_u16): Remove.
(__arm_vqrshlq_m_n_s16): Remove.
(__arm_vqrshlq_m_n_u32): Remove.
(__arm_vqrshlq_m_n_s32): Remove.
(__arm_vqrshlq_m_s8): Remove.
(__arm_vqrshlq_m_s32): Remove.
(__arm_vqrshlq_m_s16): Remove.
(__arm_vqrshlq_m_u8): Remove.
(__arm_vqrshlq_m_u32): Remove.
(__arm_vqrshlq_m_u16): Remove.
(__arm_vqrshlq): Remove.
(__arm_vqrshlq_m_n): Remove.
(__arm_vqrshlq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm-mve-builtins.cc   |   4 +-
 gcc/config/arm/arm_mve.h | 969 +--
 5 files changed, 18 insertions(+), 961 deletions(-

[PATCH 13/23] arm: [MVE intrinsics] factorize vmaxq vminq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vmaxq and vminq so that they use the same pattern.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MAX_MIN_SU): New.
(max_min_su_str): New.
(max_min_supf): New.
* config/arm/mve.md (mve_vmaxq_s, mve_vmaxq_u)
(mve_vminq_s, mve_vminq_u): Merge into ...
(mve_q_): ... this.
---
 gcc/config/arm/iterators.md | 11 ++
 gcc/config/arm/mve.md   | 44 +
 2 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 3133642ea82..9ff61e0573b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -330,6 +330,9 @@ (define_code_iterator FCVT [unsigned_float float])
 ;; Saturating addition, subtraction
 (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
 
+;; Max/Min iterator, to factorize MVE patterns
+(define_code_iterator MAX_MIN_SU [smax umax smin umin])
+
 ;; MVE integer binary operations.
 (define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
 
@@ -1271,6 +1274,14 @@ (define_code_attr float_sup [(unsigned_float "u") (float 
"s")])
 
 (define_code_attr float_SUP [(unsigned_float "U") (float "S")])
 
+;; max/min for MVE
+(define_code_attr max_min_su_str [(smax "vmax") (umax "vmax") (smin "vmin") 
(umin "vmin")])
+
+(define_code_attr max_min_supf [
+(smax "s") (umax "u")
+(smin "s") (umin "u")
+])
+
 ;;
 ;; Int attributes
 ;;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c8cb4e430ac..44409b40e5f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1106,29 +1106,20 @@ (define_insn "mve_vmaxavq_s"
 ])
 
 ;;
-;; [vmaxq_u, vmaxq_s])
+;; [vmaxq_u, vmaxq_s]
+;; [vminq_s, vminq_u]
 ;;
-(define_insn "mve_vmaxq_s"
+(define_insn "mve_q_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+   (MAX_MIN_SU:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmax.%#\t%q0, %q1, %q2"
+  ".%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
-(define_insn "mve_vmaxq_u"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-   (match_operand:MVE_2 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmax.%#\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
 
 ;;
 ;; [vmaxvq_u, vmaxvq_s])
@@ -1175,31 +1166,6 @@ (define_insn "mve_vminavq_s"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vminq_s, vminq_u])
-;;
-(define_insn "mve_vminq_s"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-   (match_operand:MVE_2 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmin.%#\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-(define_insn "mve_vminq_u"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-   (match_operand:MVE_2 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmin.%#\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vminvq_u, vminvq_s])
 ;;
-- 
2.34.1

[PATCH 19/23] arm: [MVE intrinsics] factorize vqrshrunb vqrshrunt vqshrunb vqshrunt

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vqrshrunb, vqrshrunt, vqshrunb, vqshrunt so that they use
existing patterns.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_SHRN_N): Add VQRSHRUNBQ,
VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
(MVE_SHRN_M_N): Likewise.
(mve_insn): Add vqrshrunb, vqrshrunt, vqshrunb, vqshrunt.
(isu): Add VQRSHRUNBQ, VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
(supf): Likewise.
* config/arm/mve.md (mve_vqrshrunbq_n_s): Remove.
(mve_vqrshruntq_n_s): Remove.
(mve_vqshrunbq_n_s): Remove.
(mve_vqshruntq_n_s): Remove.
(mve_vqrshrunbq_m_n_s): Remove.
(mve_vqrshruntq_m_n_s): Remove.
(mve_vqshrunbq_m_n_s): Remove.
(mve_vqshruntq_m_n_s): Remove.
---
 gcc/config/arm/iterators.md |  32 +
 gcc/config/arm/mve.md   | 140 +++-
 2 files changed, 40 insertions(+), 132 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index d64c924a513..583206dac9e 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -471,8 +471,12 @@ (define_int_iterator MVE_RSHIFT_N   [
 (define_int_iterator MVE_SHRN_N [
 VQRSHRNBQ_N_S VQRSHRNBQ_N_U
 VQRSHRNTQ_N_S VQRSHRNTQ_N_U
+VQRSHRUNBQ_N_S
+VQRSHRUNTQ_N_S
 VQSHRNBQ_N_S VQSHRNBQ_N_U
 VQSHRNTQ_N_S VQSHRNTQ_N_U
+VQSHRUNBQ_N_S
+VQSHRUNTQ_N_S
 VRSHRNBQ_N_S VRSHRNBQ_N_U
 VRSHRNTQ_N_S VRSHRNTQ_N_U
 VSHRNBQ_N_S VSHRNBQ_N_U
@@ -482,8 +486,12 @@ (define_int_iterator MVE_SHRN_N [
 (define_int_iterator MVE_SHRN_M_N [
 VQRSHRNBQ_M_N_S VQRSHRNBQ_M_N_U
 VQRSHRNTQ_M_N_S VQRSHRNTQ_M_N_U
+VQRSHRUNBQ_M_N_S
+VQRSHRUNTQ_M_N_S
 VQSHRNBQ_M_N_S VQSHRNBQ_M_N_U
 VQSHRNTQ_M_N_S VQSHRNTQ_M_N_U
+VQSHRUNBQ_M_N_S
+VQSHRUNTQ_M_N_S
 VRSHRNBQ_M_N_S VRSHRNBQ_M_N_U
 VRSHRNTQ_M_N_S VRSHRNTQ_M_N_U
 VSHRNBQ_M_N_S VSHRNBQ_M_N_U
@@ -594,6 +602,10 @@ (define_int_attr mve_insn [
 (VQRSHRNBQ_N_S "vqrshrnb") (VQRSHRNBQ_N_U "vqrshrnb")
 (VQRSHRNTQ_M_N_S "vqrshrnt") (VQRSHRNTQ_M_N_U "vqrshrnt")
 (VQRSHRNTQ_N_S "vqrshrnt") (VQRSHRNTQ_N_U "vqrshrnt")
+(VQRSHRUNBQ_M_N_S "vqrshrunb")
+(VQRSHRUNBQ_N_S "vqrshrunb")
+(VQRSHRUNTQ_M_N_S "vqrshrunt")
+(VQRSHRUNTQ_N_S "vqrshrunt")
 (VQSHLQ_M_N_S "vqshl") (VQSHLQ_M_N_U "vqshl")
 (VQSHLQ_M_R_S "vqshl") (VQSHLQ_M_R_U "vqshl")
 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
@@ -604,6 +616,10 @@ (define_int_attr mve_insn [
 (VQSHRNBQ_N_S "vqshrnb") (VQSHRNBQ_N_U "vqshrnb")
 (VQSHRNTQ_M_N_S "vqshrnt") (VQSHRNTQ_M_N_U "vqshrnt")
 (VQSHRNTQ_N_S "vqshrnt") (VQSHRNTQ_N_U "vqshrnt")
+(VQSHRUNBQ_M_N_S "vqshrunb")
+(VQSHRUNBQ_N_S "vqshrunb")
+(VQSHRUNTQ_M_N_S "vqshrunt")
+(VQSHRUNTQ_N_S "vqshrunt")
 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
@@ -640,10 +656,18 @@ (define_int_attr isu[
 (VQRSHRNBQ_N_S "s") (VQRSHRNBQ_N_U "u")
 (VQRSHRNTQ_M_N_S "s") (VQRSHRNTQ_M_N_U "u")
 (VQRSHRNTQ_N_S "s") (VQRSHRNTQ_N_U "u")
+(VQRSHRUNBQ_M_N_S "s")
+(VQRSHRUNBQ_N_S "s")
+(VQRSHRUNTQ_M_N_S "s")
+(VQRSHRUNTQ_N_S "s")
 (VQSHRNBQ_M_N_S "s") (VQSHRNBQ_M_N_U "u")
 (VQSHRNBQ_N_S "s") (VQSHRNBQ_N_U "u")
 (VQSHRNTQ_M_N_S "s") (VQSHRNTQ_M_N_U "u")
 (VQSHRNTQ_N_S "s") (VQSHRNTQ_N_U "u")
+(VQSHRUNBQ_M_N_S "s")
+(VQSHRUNBQ_N_S "s")
+(VQSHRUNTQ_M_N_S "s")
+(VQSHRUNTQ_N_S "s")
 (VRSHRNBQ_M_N_S "i") (VRSHRNBQ_M_N_U "i")
 (VRSHRNBQ_N_S "i") (VRSHRNBQ_N_U "i")
 (VRSHRNTQ_M_N_S "i") (VRSHRNTQ_M_N_U "i")
@@ -1816,6 +1840,14 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U 
"u") (VREV16Q_S "s")
   (VQRDMULHQ_M_N_S "s")
   (VQDMULHQ_S "s")
   (VQRDMULHQ_S "s")
+  (VQRSHRUNBQ_M_N_S "s")
+  (VQRSHRUNBQ_N_S "s")
+  (VQRSHRUNTQ_M_N_S "s")
+  (VQRSHRUNTQ_N_S "s")
+  (VQSHRUNBQ_M_N_S "s")
+  (VQSHRUNBQ_N_S "s")
+  (VQSHRUNTQ_M_N_S "

[PATCH 14/23] arm: [MVE intrinsics] rework vmaxq vminq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vmaxq and vminq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_NO_F): New.
(vmaxq, vminq): New.
* config/arm/arm-mve-builtins-base.def (vmaxq, vminq): New.
* config/arm/arm-mve-builtins-base.h (vmaxq, vminq): New.
* config/arm/arm_mve.h (vminq): Remove.
(vmaxq): Remove.
(vmaxq_m): Remove.
(vminq_m): Remove.
(vminq_x): Remove.
(vmaxq_x): Remove.
(vminq_u8): Remove.
(vmaxq_u8): Remove.
(vminq_s8): Remove.
(vmaxq_s8): Remove.
(vminq_u16): Remove.
(vmaxq_u16): Remove.
(vminq_s16): Remove.
(vmaxq_s16): Remove.
(vminq_u32): Remove.
(vmaxq_u32): Remove.
(vminq_s32): Remove.
(vmaxq_s32): Remove.
(vmaxq_m_s8): Remove.
(vmaxq_m_s32): Remove.
(vmaxq_m_s16): Remove.
(vmaxq_m_u8): Remove.
(vmaxq_m_u32): Remove.
(vmaxq_m_u16): Remove.
(vminq_m_s8): Remove.
(vminq_m_s32): Remove.
(vminq_m_s16): Remove.
(vminq_m_u8): Remove.
(vminq_m_u32): Remove.
(vminq_m_u16): Remove.
(vminq_x_s8): Remove.
(vminq_x_s16): Remove.
(vminq_x_s32): Remove.
(vminq_x_u8): Remove.
(vminq_x_u16): Remove.
(vminq_x_u32): Remove.
(vmaxq_x_s8): Remove.
(vmaxq_x_s16): Remove.
(vmaxq_x_s32): Remove.
(vmaxq_x_u8): Remove.
(vmaxq_x_u16): Remove.
(vmaxq_x_u32): Remove.
(__arm_vminq_u8): Remove.
(__arm_vmaxq_u8): Remove.
(__arm_vminq_s8): Remove.
(__arm_vmaxq_s8): Remove.
(__arm_vminq_u16): Remove.
(__arm_vmaxq_u16): Remove.
(__arm_vminq_s16): Remove.
(__arm_vmaxq_s16): Remove.
(__arm_vminq_u32): Remove.
(__arm_vmaxq_u32): Remove.
(__arm_vminq_s32): Remove.
(__arm_vmaxq_s32): Remove.
(__arm_vmaxq_m_s8): Remove.
(__arm_vmaxq_m_s32): Remove.
(__arm_vmaxq_m_s16): Remove.
(__arm_vmaxq_m_u8): Remove.
(__arm_vmaxq_m_u32): Remove.
(__arm_vmaxq_m_u16): Remove.
(__arm_vminq_m_s8): Remove.
(__arm_vminq_m_s32): Remove.
(__arm_vminq_m_s16): Remove.
(__arm_vminq_m_u8): Remove.
(__arm_vminq_m_u32): Remove.
(__arm_vminq_m_u16): Remove.
(__arm_vminq_x_s8): Remove.
(__arm_vminq_x_s16): Remove.
(__arm_vminq_x_s32): Remove.
(__arm_vminq_x_u8): Remove.
(__arm_vminq_x_u16): Remove.
(__arm_vminq_x_u32): Remove.
(__arm_vmaxq_x_s8): Remove.
(__arm_vmaxq_x_s16): Remove.
(__arm_vmaxq_x_s32): Remove.
(__arm_vmaxq_x_u8): Remove.
(__arm_vmaxq_x_u16): Remove.
(__arm_vmaxq_x_u32): Remove.
(__arm_vminq): Remove.
(__arm_vmaxq): Remove.
(__arm_vmaxq_m): Remove.
(__arm_vminq_m): Remove.
(__arm_vminq_x): Remove.
(__arm_vmaxq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  11 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 628 ---
 4 files changed, 15 insertions(+), 628 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 4bebf86f784..1839d5cb1a5 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -110,6 +110,15 @@ namespace arm_mve {
 UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,  \
 UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
 
+  /* Helper for builtins with RTX codes, _m predicated override, but
+ no floating-point versions.  */
+#define FUNCTION_WITH_RTX_M_NO_F(NAME, RTX_S, RTX_U, UNSPEC) FUNCTION  \
+  (NAME, unspec_based_mve_function_exact_insn, \
+   (RTX_S, RTX_U, UNKNOWN, \
+-1, -1, -1,
\
+UNSPEC##_M_S, UNSPEC##_M_U, -1,\
+-1, -1, -1))
+
   /* Helper for builtins without RTX codes, no _m predicated and no _n
  overrides.  */
 #define FUNCTION_WITHOUT_M_N(NAME, UNSPEC) FUNCTION\
@@ -173,6 +182,8 @@ FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
 FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
+FUNCTION_WITH_RTX_M_NO_F (vmaxq, SMAX, UMAX, VMAXQ)
+FUNCTION_WITH_RTX_M_NO_F (vminq, SMIN, UMIN, VMINQ)
 FUNCTION_WITHOUT_N_NO_F (vmulhq, VMULHQ)
 FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
 FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index f2e40cda2af..3b42bf46e81 10

[PATCH 15/23] arm: [MVE intrinsics] add binary_rshift_narrow shape

2023-05-05 Thread Christophe Lyon via Gcc-patches

This patch adds the binary_rshift_narrow shape description.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_rshift_narrow):
New.
* config/arm/arm-mve-builtins-shapes.h (binary_rshift_narrow): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 47 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 48 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 4ecb612ece5..88934e1ca15 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -617,6 +617,53 @@ struct binary_lshift_r_def : public overloaded_base<0>
 };
 SHAPE (binary_lshift_r)
 
+/* _t vfoo[_n_t0](_t, _t, const int)
+
+   Narrowing right shifts.
+   Check that 'imm' is in the [1..#bits/2] range.
+
+   Example: vqrshrnbq.
+   int8x16_t [__arm_]vqrshrnbq[_n_s16](int8x16_t a, int16x8_t b, const int imm)
+   int8x16_t [__arm_]vqrshrnbq_m[_n_s16](int8x16_t a, int16x8_t b, const int 
imm, mve_pred16_t p)  */
+struct binary_rshift_narrow_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "vh0,vh0,v0,ss32", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (3, i, nargs)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
+   || !r.require_integer_immediate (i))
+  return error_mark_node;
+
+type_suffix_index narrow_suffix
+  = find_type_suffix (type_suffixes[type].tclass,
+ type_suffixes[type].element_bits / 2);
+
+if (!r.require_matching_vector_type (0, narrow_suffix))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+unsigned int bits = c.type_suffix (0).element_bits;
+return c.require_immediate_range (2, 1, bits / 2);
+  }
+};
+SHAPE (binary_rshift_narrow)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 25d9b60a670..d72686d187b 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -40,6 +40,7 @@ namespace arm_mve
 extern const function_shape *const binary_opt_n;
 extern const function_shape *const binary_orrq;
 extern const function_shape *const binary_round_lshift;
+extern const function_shape *const binary_rshift_narrow;
 extern const function_shape *const create;
 extern const function_shape *const inherent;
 extern const function_shape *const unary_convert;
-- 
2.34.1

[PATCH 20/23] arm: [MVE intrinsics] rework vqrshrunbq vqrshruntq vqshrunbq vqshruntq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vqrshrunbq, vqrshruntq, vqshrunbq, vqshruntq using the new
MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_U_F): New.
(vqshrunbq, vqshruntq, vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins-base.def (vqshrunbq, vqshruntq)
(vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins-base.h (vqshrunbq, vqshruntq)
(vqrshrunbq, vqrshruntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vqshrunbq,
vqshruntq, vqrshrunbq, vqrshruntq.
* config/arm/arm_mve.h (vqrshrunbq): Remove.
(vqrshruntq): Remove.
(vqrshrunbq_m): Remove.
(vqrshruntq_m): Remove.
(vqrshrunbq_n_s16): Remove.
(vqrshrunbq_n_s32): Remove.
(vqrshruntq_n_s16): Remove.
(vqrshruntq_n_s32): Remove.
(vqrshrunbq_m_n_s32): Remove.
(vqrshrunbq_m_n_s16): Remove.
(vqrshruntq_m_n_s32): Remove.
(vqrshruntq_m_n_s16): Remove.
(__arm_vqrshrunbq_n_s16): Remove.
(__arm_vqrshrunbq_n_s32): Remove.
(__arm_vqrshruntq_n_s16): Remove.
(__arm_vqrshruntq_n_s32): Remove.
(__arm_vqrshrunbq_m_n_s32): Remove.
(__arm_vqrshrunbq_m_n_s16): Remove.
(__arm_vqrshruntq_m_n_s32): Remove.
(__arm_vqrshruntq_m_n_s16): Remove.
(__arm_vqrshrunbq): Remove.
(__arm_vqrshruntq): Remove.
(__arm_vqrshrunbq_m): Remove.
(__arm_vqrshruntq_m): Remove.
(vqshrunbq): Remove.
(vqshruntq): Remove.
(vqshrunbq_m): Remove.
(vqshruntq_m): Remove.
(vqshrunbq_n_s16): Remove.
(vqshruntq_n_s16): Remove.
(vqshrunbq_n_s32): Remove.
(vqshruntq_n_s32): Remove.
(vqshrunbq_m_n_s32): Remove.
(vqshrunbq_m_n_s16): Remove.
(vqshruntq_m_n_s32): Remove.
(vqshruntq_m_n_s16): Remove.
(__arm_vqshrunbq_n_s16): Remove.
(__arm_vqshruntq_n_s16): Remove.
(__arm_vqshrunbq_n_s32): Remove.
(__arm_vqshruntq_n_s32): Remove.
(__arm_vqshrunbq_m_n_s32): Remove.
(__arm_vqshrunbq_m_n_s16): Remove.
(__arm_vqshruntq_m_n_s32): Remove.
(__arm_vqshruntq_m_n_s16): Remove.
(__arm_vqshrunbq): Remove.
(__arm_vqshruntq): Remove.
(__arm_vqshrunbq_m): Remove.
(__arm_vqshruntq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  13 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm-mve-builtins.cc   |   4 +
 gcc/config/arm/arm_mve.h | 320 ---
 5 files changed, 25 insertions(+), 320 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index c95abe70239..e7d2e0abffc 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -184,6 +184,15 @@ namespace arm_mve {
 -1, -1, -1,
\
 UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
 
+  /* Helper for builtins with only unspec codes, _m predicated
+ overrides, only _n version, no unsigned, no floating-point.  */
+#define FUNCTION_ONLY_N_NO_U_F(NAME, UNSPEC) FUNCTION  \
+  (NAME, unspec_mve_function_exact_insn,   \
+   (-1, -1, -1,
\
+UNSPEC##_N_S, -1, -1,  \
+-1, -1, -1,
\
+UNSPEC##_M_N_S, -1, -1))
+
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
@@ -203,8 +212,12 @@ FUNCTION_WITH_M_N_NO_U_F (vqrdmulhq, VQRDMULHQ)
 FUNCTION_WITH_M_N_R (vqshlq, VQSHLQ)
 FUNCTION_ONLY_N_NO_F (vqrshrnbq, VQRSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vqrshrntq, VQRSHRNTQ)
+FUNCTION_ONLY_N_NO_U_F (vqrshrunbq, VQRSHRUNBQ)
+FUNCTION_ONLY_N_NO_U_F (vqrshruntq, VQRSHRUNTQ)
 FUNCTION_ONLY_N_NO_F (vqshrnbq, VQSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vqshrntq, VQSHRNTQ)
+FUNCTION_ONLY_N_NO_U_F (vqshrunbq, VQSHRUNBQ)
+FUNCTION_ONLY_N_NO_U_F (vqshruntq, VQSHRUNTQ)
 FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
 FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 3dd40086663..50cb2d055e9 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -36,10 +36,14 @@ DEF_MVE_FUNCTION (vqrdmulhq, binary_opt_n, all_signed, 
m_or_none)
 DEF_MVE_FUNCTION (vqrshlq, binary_round_lshift, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vqrshrnbq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vqrshrntq, binary_rshift_narrow, integ

[PATCH 16/23] arm: [MVE intrinsics] factorize vshrntq vshrnbq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Factorize vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq, vshrntq, vshrnbq,
vrshrnbq and vrshrntq so that they use the same pattern.

Introduce  iterator for *shrn* so that we can use the same
pattern despite the different "s", "u" and "i" suffixes.

2022-09-08  Christophe Lyon 

gcc/
* config/arm/iterators.md (MVE_SHRN_N, MVE_SHRN_M_N): New.
(mve_insn): Add vqrshrnb, vqrshrnt, vqshrnb, vqshrnt, vrshrnb,
vrshrnt, vshrnb, vshrnt.
(isu): New.
* config/arm/mve.md (mve_vqrshrnbq_n_)
(mve_vqrshrntq_n_, mve_vqshrnbq_n_)
(mve_vqshrntq_n_, mve_vrshrnbq_n_)
(mve_vrshrntq_n_, mve_vshrnbq_n_)
(mve_vshrntq_n_): Merge into ...
(@mve_q_n_): ... this.
(mve_vqrshrnbq_m_n_, mve_vqrshrntq_m_n_)
(mve_vqshrnbq_m_n_, mve_vqshrntq_m_n_)
(mve_vrshrnbq_m_n_, mve_vrshrntq_m_n_)
(mve_vshrnbq_m_n_, mve_vshrntq_m_n_):
Merge into ...
(@mve_q_m_n_): ... this.
---
 gcc/config/arm/iterators.md |  57 
 gcc/config/arm/mve.md   | 270 
 2 files changed, 85 insertions(+), 242 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 9ff61e0573b..d64c924a513 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -468,6 +468,28 @@ (define_int_iterator MVE_RSHIFT_N   [
 VRSHLQ_N_S VRSHLQ_N_U
 ])
 
+(define_int_iterator MVE_SHRN_N [
+VQRSHRNBQ_N_S VQRSHRNBQ_N_U
+VQRSHRNTQ_N_S VQRSHRNTQ_N_U
+VQSHRNBQ_N_S VQSHRNBQ_N_U
+VQSHRNTQ_N_S VQSHRNTQ_N_U
+VRSHRNBQ_N_S VRSHRNBQ_N_U
+VRSHRNTQ_N_S VRSHRNTQ_N_U
+VSHRNBQ_N_S VSHRNBQ_N_U
+VSHRNTQ_N_S VSHRNTQ_N_U
+])
+
+(define_int_iterator MVE_SHRN_M_N [
+VQRSHRNBQ_M_N_S VQRSHRNBQ_M_N_U
+VQRSHRNTQ_M_N_S VQRSHRNTQ_M_N_U
+VQSHRNBQ_M_N_S VQSHRNBQ_M_N_U
+VQSHRNTQ_M_N_S VQSHRNTQ_M_N_U
+VRSHRNBQ_M_N_S VRSHRNBQ_M_N_U
+VRSHRNTQ_M_N_S VRSHRNTQ_M_N_U
+VSHRNBQ_M_N_S VSHRNBQ_M_N_U
+VSHRNTQ_M_N_S VSHRNTQ_M_N_U
+])
+
 (define_int_iterator MVE_FP_M_BINARY   [
 VABDQ_M_F
 VADDQ_M_F
@@ -568,12 +590,20 @@ (define_int_attr mve_insn [
 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
 (VQRSHLQ_N_S "vqrshl") (VQRSHLQ_N_U "vqrshl")
 (VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
+(VQRSHRNBQ_M_N_S "vqrshrnb") (VQRSHRNBQ_M_N_U "vqrshrnb")
+(VQRSHRNBQ_N_S "vqrshrnb") (VQRSHRNBQ_N_U "vqrshrnb")
+(VQRSHRNTQ_M_N_S "vqrshrnt") (VQRSHRNTQ_M_N_U "vqrshrnt")
+(VQRSHRNTQ_N_S "vqrshrnt") (VQRSHRNTQ_N_U "vqrshrnt")
 (VQSHLQ_M_N_S "vqshl") (VQSHLQ_M_N_U "vqshl")
 (VQSHLQ_M_R_S "vqshl") (VQSHLQ_M_R_U "vqshl")
 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
 (VQSHLQ_N_S "vqshl") (VQSHLQ_N_U "vqshl")
 (VQSHLQ_R_S "vqshl") (VQSHLQ_R_U "vqshl")
 (VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
+(VQSHRNBQ_M_N_S "vqshrnb") (VQSHRNBQ_M_N_U "vqshrnb")
+(VQSHRNBQ_N_S "vqshrnb") (VQSHRNBQ_N_U "vqshrnb")
+(VQSHRNTQ_M_N_S "vqshrnt") (VQSHRNTQ_M_N_U "vqshrnt")
+(VQSHRNTQ_N_S "vqshrnt") (VQSHRNTQ_N_U "vqshrnt")
 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
@@ -586,17 +616,44 @@ (define_int_attr mve_insn [
 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
 (VRSHLQ_N_S "vrshl") (VRSHLQ_N_U "vrshl")
 (VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
+(VRSHRNBQ_M_N_S "vrshrnb") (VRSHRNBQ_M_N_U "vrshrnb")
+(VRSHRNBQ_N_S "vrshrnb") (VRSHRNBQ_N_U "vrshrnb")
+(VRSHRNTQ_M_N_S "vrshrnt") (VRSHRNTQ_M_N_U "vrshrnt")
+(VRSHRNTQ_N_S "vrshrnt") (VRSHRNTQ_N_U "vrshrnt")
 (VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
 (VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
 (VSHLQ_N_S "vshl") (VSHLQ_N_U "vshl")
 (VSHLQ_R_S "vshl") (VSHLQ_R_U "vshl")
 (VSHLQ_S "vshl") (VSHLQ_U "vshl")
+(VSHRNBQ_M_N_S "vshrnb") (VSHRNBQ_M_N_U "vshrnb")
+(VSHRNBQ_N_S "vshrnb") (VSHRNBQ_N_U "vshrnb")
+(VSHRNTQ_M_N_S "vshrnt") (VSHRNTQ_M_N_U "vshrnt")
+(VSHRNTQ_N_S "vshrnt") (VSHRNTQ_N_U "vshrnt")
 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 (VSU

[PATCH 17/23] arm: [MVE intrinsics] rework vshrnbq vshrntq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq,
vqrshrnbq, vqrshrntq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_F): New.
(vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq)
(vqrshrnbq, vqrshrntq): New.
* config/arm/arm-mve-builtins-base.def (vshrnbq, vshrntq)
(vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq):
New.
* config/arm/arm-mve-builtins-base.h (vshrnbq, vshrntq, vrshrnbq)
(vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vshrnbq,
vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq,
vqrshrntq.
* config/arm/arm_mve.h (vshrnbq): Remove.
(vshrntq): Remove.
(vshrnbq_m): Remove.
(vshrntq_m): Remove.
(vshrnbq_n_s16): Remove.
(vshrntq_n_s16): Remove.
(vshrnbq_n_u16): Remove.
(vshrntq_n_u16): Remove.
(vshrnbq_n_s32): Remove.
(vshrntq_n_s32): Remove.
(vshrnbq_n_u32): Remove.
(vshrntq_n_u32): Remove.
(vshrnbq_m_n_s32): Remove.
(vshrnbq_m_n_s16): Remove.
(vshrnbq_m_n_u32): Remove.
(vshrnbq_m_n_u16): Remove.
(vshrntq_m_n_s32): Remove.
(vshrntq_m_n_s16): Remove.
(vshrntq_m_n_u32): Remove.
(vshrntq_m_n_u16): Remove.
(__arm_vshrnbq_n_s16): Remove.
(__arm_vshrntq_n_s16): Remove.
(__arm_vshrnbq_n_u16): Remove.
(__arm_vshrntq_n_u16): Remove.
(__arm_vshrnbq_n_s32): Remove.
(__arm_vshrntq_n_s32): Remove.
(__arm_vshrnbq_n_u32): Remove.
(__arm_vshrntq_n_u32): Remove.
(__arm_vshrnbq_m_n_s32): Remove.
(__arm_vshrnbq_m_n_s16): Remove.
(__arm_vshrnbq_m_n_u32): Remove.
(__arm_vshrnbq_m_n_u16): Remove.
(__arm_vshrntq_m_n_s32): Remove.
(__arm_vshrntq_m_n_s16): Remove.
(__arm_vshrntq_m_n_u32): Remove.
(__arm_vshrntq_m_n_u16): Remove.
(__arm_vshrnbq): Remove.
(__arm_vshrntq): Remove.
(__arm_vshrnbq_m): Remove.
(__arm_vshrntq_m): Remove.
(vrshrnbq): Remove.
(vrshrntq): Remove.
(vrshrnbq_m): Remove.
(vrshrntq_m): Remove.
(vrshrnbq_n_s16): Remove.
(vrshrntq_n_s16): Remove.
(vrshrnbq_n_u16): Remove.
(vrshrntq_n_u16): Remove.
(vrshrnbq_n_s32): Remove.
(vrshrntq_n_s32): Remove.
(vrshrnbq_n_u32): Remove.
(vrshrntq_n_u32): Remove.
(vrshrnbq_m_n_s32): Remove.
(vrshrnbq_m_n_s16): Remove.
(vrshrnbq_m_n_u32): Remove.
(vrshrnbq_m_n_u16): Remove.
(vrshrntq_m_n_s32): Remove.
(vrshrntq_m_n_s16): Remove.
(vrshrntq_m_n_u32): Remove.
(vrshrntq_m_n_u16): Remove.
(__arm_vrshrnbq_n_s16): Remove.
(__arm_vrshrntq_n_s16): Remove.
(__arm_vrshrnbq_n_u16): Remove.
(__arm_vrshrntq_n_u16): Remove.
(__arm_vrshrnbq_n_s32): Remove.
(__arm_vrshrntq_n_s32): Remove.
(__arm_vrshrnbq_n_u32): Remove.
(__arm_vrshrntq_n_u32): Remove.
(__arm_vrshrnbq_m_n_s32): Remove.
(__arm_vrshrnbq_m_n_s16): Remove.
(__arm_vrshrnbq_m_n_u32): Remove.
(__arm_vrshrnbq_m_n_u16): Remove.
(__arm_vrshrntq_m_n_s32): Remove.
(__arm_vrshrntq_m_n_s16): Remove.
(__arm_vrshrntq_m_n_u32): Remove.
(__arm_vrshrntq_m_n_u16): Remove.
(__arm_vrshrnbq): Remove.
(__arm_vrshrntq): Remove.
(__arm_vrshrnbq_m): Remove.
(__arm_vrshrntq_m): Remove.
(vqshrnbq): Remove.
(vqshrntq): Remove.
(vqshrnbq_m): Remove.
(vqshrntq_m): Remove.
(vqshrnbq_n_s16): Remove.
(vqshrntq_n_s16): Remove.
(vqshrnbq_n_u16): Remove.
(vqshrntq_n_u16): Remove.
(vqshrnbq_n_s32): Remove.
(vqshrntq_n_s32): Remove.
(vqshrnbq_n_u32): Remove.
(vqshrntq_n_u32): Remove.
(vqshrnbq_m_n_s32): Remove.
(vqshrnbq_m_n_s16): Remove.
(vqshrnbq_m_n_u32): Remove.
(vqshrnbq_m_n_u16): Remove.
(vqshrntq_m_n_s32): Remove.
(vqshrntq_m_n_s16): Remove.
(vqshrntq_m_n_u32): Remove.
(vqshrntq_m_n_u16): Remove.
(__arm_vqshrnbq_n_s16): Remove.
(__arm_vqshrntq_n_s16): Remove.
(__arm_vqshrnbq_n_u16): Remove.
(__arm_vqshrntq_n_u16): Remove.
(__arm_vqshrnbq_n_s32): Remove.
(__arm_vqshrntq_n_s32): Remove.
(__arm_vqshrnbq_n_u32): Remove.
(__arm_vqshrntq_n_u32): Remove.
(__arm_vqshrnbq_m_n_s32): Remove.
(__arm_vqshrnbq_m_n_s16): Remove.
(__arm_vqshrnbq_m_n_u32): Remove.
(__arm_vqshrnbq_m_n_u16): Remove.
(__arm_vqshrntq_m_n_s32): Remove.
(__arm_vqshrntq_m_n

[PATCH 12/23] arm: [MVE intrinsics] rework vqshlq vshlq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vqshlq, vshlq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_R): New.
(vqshlq, vshlq): New.
* config/arm/arm-mve-builtins-base.def (vqshlq, vshlq): New.
* config/arm/arm-mve-builtins-base.h (vqshlq, vshlq): New.
* config/arm/arm_mve.h (vshlq): Remove.
(vshlq_r): Remove.
(vshlq_n): Remove.
(vshlq_m_r): Remove.
(vshlq_m): Remove.
(vshlq_m_n): Remove.
(vshlq_x): Remove.
(vshlq_x_n): Remove.
(vshlq_s8): Remove.
(vshlq_s16): Remove.
(vshlq_s32): Remove.
(vshlq_u8): Remove.
(vshlq_u16): Remove.
(vshlq_u32): Remove.
(vshlq_r_u8): Remove.
(vshlq_n_u8): Remove.
(vshlq_r_s8): Remove.
(vshlq_n_s8): Remove.
(vshlq_r_u16): Remove.
(vshlq_n_u16): Remove.
(vshlq_r_s16): Remove.
(vshlq_n_s16): Remove.
(vshlq_r_u32): Remove.
(vshlq_n_u32): Remove.
(vshlq_r_s32): Remove.
(vshlq_n_s32): Remove.
(vshlq_m_r_u8): Remove.
(vshlq_m_r_s8): Remove.
(vshlq_m_r_u16): Remove.
(vshlq_m_r_s16): Remove.
(vshlq_m_r_u32): Remove.
(vshlq_m_r_s32): Remove.
(vshlq_m_u8): Remove.
(vshlq_m_s8): Remove.
(vshlq_m_u16): Remove.
(vshlq_m_s16): Remove.
(vshlq_m_u32): Remove.
(vshlq_m_s32): Remove.
(vshlq_m_n_s8): Remove.
(vshlq_m_n_s32): Remove.
(vshlq_m_n_s16): Remove.
(vshlq_m_n_u8): Remove.
(vshlq_m_n_u32): Remove.
(vshlq_m_n_u16): Remove.
(vshlq_x_s8): Remove.
(vshlq_x_s16): Remove.
(vshlq_x_s32): Remove.
(vshlq_x_u8): Remove.
(vshlq_x_u16): Remove.
(vshlq_x_u32): Remove.
(vshlq_x_n_s8): Remove.
(vshlq_x_n_s16): Remove.
(vshlq_x_n_s32): Remove.
(vshlq_x_n_u8): Remove.
(vshlq_x_n_u16): Remove.
(vshlq_x_n_u32): Remove.
(__arm_vshlq_s8): Remove.
(__arm_vshlq_s16): Remove.
(__arm_vshlq_s32): Remove.
(__arm_vshlq_u8): Remove.
(__arm_vshlq_u16): Remove.
(__arm_vshlq_u32): Remove.
(__arm_vshlq_r_u8): Remove.
(__arm_vshlq_n_u8): Remove.
(__arm_vshlq_r_s8): Remove.
(__arm_vshlq_n_s8): Remove.
(__arm_vshlq_r_u16): Remove.
(__arm_vshlq_n_u16): Remove.
(__arm_vshlq_r_s16): Remove.
(__arm_vshlq_n_s16): Remove.
(__arm_vshlq_r_u32): Remove.
(__arm_vshlq_n_u32): Remove.
(__arm_vshlq_r_s32): Remove.
(__arm_vshlq_n_s32): Remove.
(__arm_vshlq_m_r_u8): Remove.
(__arm_vshlq_m_r_s8): Remove.
(__arm_vshlq_m_r_u16): Remove.
(__arm_vshlq_m_r_s16): Remove.
(__arm_vshlq_m_r_u32): Remove.
(__arm_vshlq_m_r_s32): Remove.
(__arm_vshlq_m_u8): Remove.
(__arm_vshlq_m_s8): Remove.
(__arm_vshlq_m_u16): Remove.
(__arm_vshlq_m_s16): Remove.
(__arm_vshlq_m_u32): Remove.
(__arm_vshlq_m_s32): Remove.
(__arm_vshlq_m_n_s8): Remove.
(__arm_vshlq_m_n_s32): Remove.
(__arm_vshlq_m_n_s16): Remove.
(__arm_vshlq_m_n_u8): Remove.
(__arm_vshlq_m_n_u32): Remove.
(__arm_vshlq_m_n_u16): Remove.
(__arm_vshlq_x_s8): Remove.
(__arm_vshlq_x_s16): Remove.
(__arm_vshlq_x_s32): Remove.
(__arm_vshlq_x_u8): Remove.
(__arm_vshlq_x_u16): Remove.
(__arm_vshlq_x_u32): Remove.
(__arm_vshlq_x_n_s8): Remove.
(__arm_vshlq_x_n_s16): Remove.
(__arm_vshlq_x_n_s32): Remove.
(__arm_vshlq_x_n_u8): Remove.
(__arm_vshlq_x_n_u16): Remove.
(__arm_vshlq_x_n_u32): Remove.
(__arm_vshlq): Remove.
(__arm_vshlq_r): Remove.
(__arm_vshlq_n): Remove.
(__arm_vshlq_m_r): Remove.
(__arm_vshlq_m): Remove.
(__arm_vshlq_m_n): Remove.
(__arm_vshlq_x): Remove.
(__arm_vshlq_x_n): Remove.
(vqshlq): Remove.
(vqshlq_r): Remove.
(vqshlq_n): Remove.
(vqshlq_m_r): Remove.
(vqshlq_m_n): Remove.
(vqshlq_m): Remove.
(vqshlq_u8): Remove.
(vqshlq_r_u8): Remove.
(vqshlq_n_u8): Remove.
(vqshlq_s8): Remove.
(vqshlq_r_s8): Remove.
(vqshlq_n_s8): Remove.
(vqshlq_u16): Remove.
(vqshlq_r_u16): Remove.
(vqshlq_n_u16): Remove.
(vqshlq_s16): Remove.
(vqshlq_r_s16): Remove.
(vqshlq_n_s16): Remove.
(vqshlq_u32): Remove.
(vqshlq_r_u32): Remove.
(vqshlq_n_u32): Remove.
(vqshlq_s32): Remove.
(vqshlq_r_s32): Remove.
(vqshlq_n_s32): Remove.
(vqshlq_m_r_u8): Remove.
(vqshlq_m_r_s8): Remove.
(vqshlq_m_r_u16): Remove.
(vqshlq_m_r_s16): Remove.

[PATCH 23/23] arm: [MVE intrinsics] rework vshrq vrshrq

2023-05-05 Thread Christophe Lyon via Gcc-patches

Implement vshrq and vrshrq using the new MVE builtins framework.

2022-09-08  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vrshrq, vshrq): New.
* config/arm/arm-mve-builtins-base.def (vrshrq, vshrq): New.
* config/arm/arm-mve-builtins-base.h (vrshrq, vshrq): New.
* config/arm/arm_mve.h (vshrq): Remove.
(vrshrq): Remove.
(vrshrq_m): Remove.
(vshrq_m): Remove.
(vrshrq_x): Remove.
(vshrq_x): Remove.
(vshrq_n_s8): Remove.
(vshrq_n_s16): Remove.
(vshrq_n_s32): Remove.
(vshrq_n_u8): Remove.
(vshrq_n_u16): Remove.
(vshrq_n_u32): Remove.
(vrshrq_n_u8): Remove.
(vrshrq_n_s8): Remove.
(vrshrq_n_u16): Remove.
(vrshrq_n_s16): Remove.
(vrshrq_n_u32): Remove.
(vrshrq_n_s32): Remove.
(vrshrq_m_n_s8): Remove.
(vrshrq_m_n_s32): Remove.
(vrshrq_m_n_s16): Remove.
(vrshrq_m_n_u8): Remove.
(vrshrq_m_n_u32): Remove.
(vrshrq_m_n_u16): Remove.
(vshrq_m_n_s8): Remove.
(vshrq_m_n_s32): Remove.
(vshrq_m_n_s16): Remove.
(vshrq_m_n_u8): Remove.
(vshrq_m_n_u32): Remove.
(vshrq_m_n_u16): Remove.
(vrshrq_x_n_s8): Remove.
(vrshrq_x_n_s16): Remove.
(vrshrq_x_n_s32): Remove.
(vrshrq_x_n_u8): Remove.
(vrshrq_x_n_u16): Remove.
(vrshrq_x_n_u32): Remove.
(vshrq_x_n_s8): Remove.
(vshrq_x_n_s16): Remove.
(vshrq_x_n_s32): Remove.
(vshrq_x_n_u8): Remove.
(vshrq_x_n_u16): Remove.
(vshrq_x_n_u32): Remove.
(__arm_vshrq_n_s8): Remove.
(__arm_vshrq_n_s16): Remove.
(__arm_vshrq_n_s32): Remove.
(__arm_vshrq_n_u8): Remove.
(__arm_vshrq_n_u16): Remove.
(__arm_vshrq_n_u32): Remove.
(__arm_vrshrq_n_u8): Remove.
(__arm_vrshrq_n_s8): Remove.
(__arm_vrshrq_n_u16): Remove.
(__arm_vrshrq_n_s16): Remove.
(__arm_vrshrq_n_u32): Remove.
(__arm_vrshrq_n_s32): Remove.
(__arm_vrshrq_m_n_s8): Remove.
(__arm_vrshrq_m_n_s32): Remove.
(__arm_vrshrq_m_n_s16): Remove.
(__arm_vrshrq_m_n_u8): Remove.
(__arm_vrshrq_m_n_u32): Remove.
(__arm_vrshrq_m_n_u16): Remove.
(__arm_vshrq_m_n_s8): Remove.
(__arm_vshrq_m_n_s32): Remove.
(__arm_vshrq_m_n_s16): Remove.
(__arm_vshrq_m_n_u8): Remove.
(__arm_vshrq_m_n_u32): Remove.
(__arm_vshrq_m_n_u16): Remove.
(__arm_vrshrq_x_n_s8): Remove.
(__arm_vrshrq_x_n_s16): Remove.
(__arm_vrshrq_x_n_s32): Remove.
(__arm_vrshrq_x_n_u8): Remove.
(__arm_vrshrq_x_n_u16): Remove.
(__arm_vrshrq_x_n_u32): Remove.
(__arm_vshrq_x_n_s8): Remove.
(__arm_vshrq_x_n_s16): Remove.
(__arm_vshrq_x_n_s32): Remove.
(__arm_vshrq_x_n_u8): Remove.
(__arm_vshrq_x_n_u16): Remove.
(__arm_vshrq_x_n_u32): Remove.
(__arm_vshrq): Remove.
(__arm_vrshrq): Remove.
(__arm_vrshrq_m): Remove.
(__arm_vshrq_m): Remove.
(__arm_vrshrq_x): Remove.
(__arm_vshrq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 628 ---
 4 files changed, 6 insertions(+), 628 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index e7d2e0abffc..bb585a3921f 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -225,9 +225,11 @@ FUNCTION_WITHOUT_N_NO_F (vrmulhq, VRMULHQ)
 FUNCTION_WITH_M_N_NO_F (vrshlq, VRSHLQ)
 FUNCTION_ONLY_N_NO_F (vrshrnbq, VRSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vrshrntq, VRSHRNTQ)
+FUNCTION_ONLY_N_NO_F (vrshrq, VRSHRQ)
 FUNCTION_WITH_M_N_R (vshlq, VSHLQ)
 FUNCTION_ONLY_N_NO_F (vshrnbq, VSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
+FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ)
 FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
 FUNCTION (vuninitializedq, vuninitializedq_impl,)
 
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 50cb2d055e9..33c95c02396 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -51,10 +51,12 @@ DEF_MVE_FUNCTION (vrmulhq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vrshlq, binary_round_lshift, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vrshrnbq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vrshrntq, binary_rshift_narrow, integer_16_32, m_or_none)
+DEF_MVE_FUNCTION (vrshrq, binary_rshift, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vshlq, binary_lshift, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vshlq, binary_lshift_r, all_integer, m_or_none) // "_r" 
forms do not s

Support parallel testing in libgomp, part I [PR66005]

2023-05-05 Thread Thomas Schwinge

Hi!

[Putting Bernhard, Honza, Segher in CC, as they are eager to test this,
based on recent comments on IRC.]  ;-P


First, establish the parallel testing infrastructure -- while still
hard-coding the number of parallel slots to one.

On 2015-05-08T10:40:02+0200, I wrote:
> On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek  wrote:
>> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
>> > As reported in the PR, with the addition of all those OpenACC tests,
>> > libgomp make check times have skyrocketed since the testsuite is still
>> > run sequentially.
>
> ACK.  And, thanks for looking into that!
>
>> > Fixing this proved trivial: I managed to almost literally copy the
>> > solution from libstdc++-v3/testsuite/Makefile.am

So I recently had re-created this patch independently, before remembering
that Rainer had -- just eight years ago... ;-) -- already submitted this.
Fortunately, we've done it in almost the exact same way, just that a few
libgomp testsuite infrastructure as well as libstdc++ parallel testing
infrastructure changes (which is where these changes have been copied
from) have since gone in, which I've adjusted for.  Additionally, use
'check_DEJAGNU_libgomp_targets' instead of 'check_DEJAGNU_normal_targets'
etc. (where "normal" is a libstdc++ detail), and regarding:

>> > with a minimal change
>> > to libgomp.exp so the generated libgomp-test-support.exp file is found
>> > in both the sequential and parallel cases.  This isn't an issue in
>> > libstdc++ since all necessary variables are stored in a single
>> > site.exp.

... in 'libgomp/testsuite/lib/libgomp.exp', I've changed:

-load_file libgomp-test-support.exp
+# Search in both .. and . to support parallel and sequential testing.
+load_file -1 ../libgomp-test-support.exp libgomp-test-support.exp

... into the more explicit:

-load_file libgomp-test-support.exp
+# Search in '..' vs. '.' to support parallel vs. sequential testing.
+if [info exists ::env(GCC_RUNTEST_PARALLELIZE_DIR)] {
+load_file ../libgomp-test-support.exp
+} else {
+load_file libgomp-test-support.exp
+}

And, for now, I hard-code the number of parallel slots to one.  This
means that libgomp for 'make -j' now does use the parallel testing code
paths, but is restricted to just one slot.  That is, no actual change in
behavior, other than that 'libgomp.sum' then is filtered through
'contrib/dg-extract-results.sh'.

OK to push the attached
"Support parallel testing in libgomp, part I [PR66005]"?


Grüße
 Thomas


>> It is far from trivial though.
>> The point is that most of the OpenMP tests are parallelized with the
>> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
>> machine a lot, the higher number of hw threads the more.
>
> Do you agree that we have two classes of test cases in libgomp: 1) test
> cases that don't place a considerably higher load on the machine compared
> to "normal" (single-threaded) execution tests, because they're just
> testing some functionality that is not expected to actively depend
> on/interfere with parallelism.  If needed, and/or if not already done,
> such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs,
> num_workers, vector_length clauses, and so on) for low parallelism
> levels.  And, 2) test cases that place a considerably higher load on the
> machine compared to "normal" (single-threaded) execution tests, because
> they're testing some functionality that actively depends on/interferes
> with some kind of parallelism.  What about marking such tests specially,
> such that DejaGnu will only ever schedule one of them for execution at
> the same time?  For example, a new dg-* directive to run them wrapped
> through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such?
>
>> If we go forward with some parallelization of the tests, we at least should
>> try to export something like OMP_WAIT_POLICY=passive so that the
>> oversubscribed machine would at least not spend too much time in spinning.
>
> (Will again have the problem that DejaGnu doesn't provide infrastructure
> to communicate environment variables to boards in remote testing.)
>
>> And perhaps reconsider running all OpenACC threads 3 times, just allow
>> user to select which offloading target they want to test (host fallback,
>> the host nonshm hack, PTX, XeonPHI in the future?), and test just that
>> (that is pretty much how OpenMP offloading testing works).
>
> My rationale is: if you configure GCC to support a set of offloading
> devices (more than one), you'll also want to get the test coverage that
> indeed all these work as expected.  (It currently doesn't matter, but...)
> that's something I'd like to see improved in the libgomp OpenMP
> offloading testing (once it supports more than one architecture for
> offloading).
>
>> For tests that
>> always want to test host fallback, I hope OpenACC offers clauses to force
>> the host fallback.
>
> Yes.
>
>
> Grüße,
>  Thomas

Support parallel testing in libgomp, part II [PR66005]

2023-05-05 Thread Thomas Schwinge

Hi!

On 2023-05-05T10:55:41+0200, I wrote:
> [Putting Bernhard, Honza, Segher in CC, as they are eager to test this,
> based on recent comments on IRC.]  ;-P


> First, establish the parallel testing infrastructure -- while still
> hard-coding the number of parallel slots to one.

> OK to push the attached
> "Support parallel testing in libgomp, part I [PR66005]"?

On top of that, second, enable parallel testing.

> On 2015-05-08T10:40:02+0200, I wrote:
>> On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek  wrote:
>>> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
>>> It is far from trivial though.
>>> The point is that most of the OpenMP tests are parallelized with the
>>> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
>>> machine a lot, the higher number of hw threads the more.
>>
>> Do you agree that we have two classes of test cases in libgomp: 1) test
>> cases that don't place a considerably higher load on the machine compared
>> to "normal" (single-threaded) execution tests, because they're just
>> testing some functionality that is not expected to actively depend
>> on/interfere with parallelism.  If needed, and/or if not already done,
>> such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs,
>> num_workers, vector_length clauses, and so on) for low parallelism
>> levels.  And, 2) test cases that place a considerably higher load on the
>> machine compared to "normal" (single-threaded) execution tests, because
>> they're testing some functionality that actively depends on/interferes
>> with some kind of parallelism.  What about marking such tests specially,
>> such that DejaGnu will only ever schedule one of them for execution at
>> the same time?  For example, a new dg-* directive to run them wrapped
>> through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such?

Bernhard on GCC IRC also suggested:

2023-04-25T19:32:57+0200:  tschwinge, we could have a dg-do
  run-serial for tests that themselves occupy more/all CPUs. Or maybe
  it would be better to look at the testcases to see if they do that
  and put them in a "serial queue". I did not look, but there
  certainly are at least some tests which we could run in parallel.

So while there certainly is potential for using more parallelism in
execution testing, I've however now implemented what I'd described in
:

| [...] parallelize *all* compilation, while just allowing for *one*
| execution test job slot.  That will require some GCC DejaGnu test
| harness hackery which I've [now] gotten to look into.  That is, enable
| the usual GCC/DejaGnu parallel testing, but also have some kind of
| mutex for the execution test invocation.  This has to play nicely with
| DejaGnu timeout handling, etc.

OK to push the attached
"Support parallel testing in libgomp, part II [PR66005]"?
See the Git commit log for further discussion.


Grüße
 Thomas


>>> If we go forward with some parallelization of the tests, we at least should
>>> try to export something like OMP_WAIT_POLICY=passive so that the
>>> oversubscribed machine would at least not spend too much time in spinning.
>>
>> (Will again have the problem that DejaGnu doesn't provide infrastructure
>> to communicate environment variables to boards in remote testing.)
>>
>>> And perhaps reconsider running all OpenACC threads 3 times, just allow
>>> user to select which offloading target they want to test (host fallback,
>>> the host nonshm hack, PTX, XeonPHI in the future?), and test just that
>>> (that is pretty much how OpenMP offloading testing works).
>>
>> My rationale is: if you configure GCC to support a set of offloading
>> devices (more than one), you'll also want to get the test coverage that
>> indeed all these work as expected.  (It currently doesn't matter, but...)
>> that's something I'd like to see improved in the libgomp OpenMP
>> offloading testing (once it supports more than one architecture for
>> offloading).
>>
>>> For tests that
>>> always want to test host fallback, I hope OpenACC offers clauses to force
>>> the host fallback.
>>
>> Yes.
>>
>>
>> Grüße,
>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f1ae4a3675ad1147aaa88405be9f000ceed703bc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 25 Apr 2023 23:53:12 +0200
Subject: [PATCH] Support parallel testing in libgomp, part II [PR66005]

..., and enable if 'flock' is available for serializing execution testing.

Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:

$ uname -srvi
Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
$ grep '^model name' < /p

[PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jakub Jelinek via Gcc-patches

Hi!

As mentioned in the PR, save_expr seems to be very optimistic when
some expression is invariant, which can result in various wrong-code
issues.
The problem is with the TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t)
case in tree_invariant_p_1.  TREE_READONLY (t) in that case says
that the object shouldn't be modified during its lifetime and
!TREE_SIDE_EFFECTS (t) that it can be evaluated safely multiple times,
but that doesn't mean we can avoid wrapping the expression into SAVE_EXPR
say for a TREE_READONLY COMPONENT_REF with INDIRECT_REF as first operand
- either the lifetime of the TREE_READONLY object could end earlier than
when we need to reevaluate the object (that happens in the
pr52339-1.c case where save_expr is called on p->a and then free (p) is
done or pr52339.C where delete a->b when calling ~B () dtor deallocates a),
or e.g. the pointer could change as in pr52339-2.c (so evaluating p->a again
after ++p yields a possibly different value than originally and again we need
a SAVE_EXPR).

Attached are two patches which fix this, unfortunately both regress
FAIL: gnat.dg/loop_optimization21.adb scan-tree-dump-times optimized 
"Index_Check" 1
FAIL: gnat.dg/vect1.adb scan-tree-dump-times vect "vectorized 1 loops" 15
FAIL: gnat.dg/vect2.adb scan-tree-dump-times vect "vectorized 1 loops" 15
FAIL: gnat.dg/vect3.adb scan-tree-dump-times vect "vectorized 1 loops" 15
FAIL: gnat.dg/vect4.adb scan-tree-dump-times vect "vectorized 1 loops" 15
FAIL: gnat.dg/vect5.adb scan-tree-dump-times vect "vectorized 1 loops" 15
FAIL: gnat.dg/vect6.adb scan-tree-dump-times vect "vectorized 1 loops" 15
on x86_64-linux (the first scan triggers 2 times rather than once,
the next 3 13 times rather than 15 and the last 3 14 times rather than 15
times).
The first patch has been otherwise successfully bootstrapped/regtested on
x86_64-linux and i686-linux (with that above regressions), the second one
is probably better but has been so far tested just on the new testcases and
verified to also cause the above Ada regressions.

Ok for trunk (which version), what to do about the regressions, shall we
just adjust the expected counts or something else?
E.g. in the vect6.adb case it is the
   procedure Add (X : Varray; Y : Long_Float; R : out Varray) is
   begin
  for I in X'Range loop
 R(I) := X(I) + Y;
  end loop;
   end;
function that is no longer vectorized.
Both patches lead to lots of former
r.P_BOUNDS->LB0
r.P_BOUNDS->UB0
x.P_BOUNDS->LB0
x.P_BOUNDS->UB0
expressions to be wrapped into SAVE_EXPRs.

Jakub
2023-05-05  Jakub Jelinek  

PR c++/52339
* tree.cc (tree_invariant_p_2): New function, copied from
tree_invariant_p.
(contains_indirect_refs): New function.
(tree_invariant_p): Rewritten as wrapper around tree_invariant_p_2.
Return false if expression contains INDIRECT_REFs or MEM_REFs which
actually dereference some pointer.
(save_expr): Use SAVE_EXPR if tree_invariant_p_1 expression contains
INDIRECT_REFs or MEM_REfs which actually dereference some pointer.
(skip_simple_arithmetic): Use tree_invariant_p_2 instead of
tree_invariant_p.

* g++.dg/opt/pr52339.C: New test.
* gcc.c-torture/execute/pr52339-1.c: New test.
* gcc.c-torture/execute/pr52339-2.c: New test.

--- gcc/tree.cc.jj  2023-05-01 09:59:46.686293833 +0200
+++ gcc/tree.cc 2023-05-04 15:40:58.684762277 +0200
@@ -3920,13 +3920,43 @@ tree_invariant_p_1 (tree t)
 
 /* Return true if T is function-invariant.  */
 
-bool
-tree_invariant_p (tree t)
+static bool
+tree_invariant_p_2 (tree t)
 {
   tree inner = skip_simple_arithmetic (t);
   return tree_invariant_p_1 (inner);
 }
 
+/* Return non-NULL if *TP is INDIRECT_REF or MEM_REF with first operand
+   other than address of a decl.  */
+
+static tree
+contains_indirect_refs (tree *tp, int *, void *)
+{
+  tree t = *tp;
+  if (TREE_CODE (t) == INDIRECT_REF)
+return t;
+  else if (TREE_CODE (t) == MEM_REF
+  && (TREE_CODE (TREE_OPERAND (t, 0)) != ADDR_EXPR
+  || !DECL_P (TREE_OPERAND (TREE_OPERAND (t, 0), 0
+return t;
+  else
+return NULL_TREE;
+}
+
+/* Return true if T is function-invariant.  Return false for expressions
+   containing pointer/reference dereferences.  */
+
+bool
+tree_invariant_p (tree t)
+{
+  if (!tree_invariant_p_2 (t))
+return false;
+  return (TREE_CONSTANT (t)
+ || !walk_tree_without_duplicates (&t, contains_indirect_refs,
+   NULL));
+}
+
 /* Wrap a SAVE_EXPR around EXPR, if appropriate.
Do this to any expression which may be used in more than one place,
but must be evaluated only once.
@@ -3963,7 +3993,13 @@ save_expr (tree expr)
   if (TREE_CODE (inner) == ERROR_MARK)
 return inner;
 
-  if (tree_invariant_p_1 (inner))
+  if (tree_invariant_p_1 (inner)
+  && (TREE_CONSTANT (expr)
+ /* Use SAVE_EXPR if there are any pointer dereferences, evaluating
+

Re: [PATCH] Add emulated scatter capability to the vectorizer

2023-05-05 Thread Christophe Lyon via Gcc-patches

On Wed, 3 May 2023 at 08:44, Richard Biener  wrote:

> On Tue, 2 May 2023, Christophe Lyon wrote:
>
> > Hi Richard,
> >
> > On Fri, 28 Apr 2023 at 14:41, Richard Biener via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > > This adds a scatter vectorization capability to the vectorizer
> > > without target support by decomposing the offset and data vectors
> > > and then performing scalar stores in the order of vector lanes.
> > > This is aimed at cases where vectorizing the rest of the loop
> > > offsets the cost of vectorizing the scatter.
> > >
> > > The offset load is still vectorized and costed as such, but like
> > > with emulated gather those will be turned back to scalar loads
> > > by forwrpop.
> > >
> > > Slightly fixed compared to the version posted in autumn,
> > > re-bootstrapped & tested on x86_64-unknown-linux-gnu and pushed.
> > >
> > > Richard.
> > >
> > > * tree-vect-data-refs.cc (vect_analyze_data_refs): Always
> > > consider scatters.
> > > * tree-vect-stmts.cc (vect_model_store_cost): Pass in the
> > > gather-scatter info and cost emulated scatters accordingly.
> > > (get_load_store_type): Support emulated scatters.
> > > (vectorizable_store): Likewise.  Emulate them by extracting
> > > scalar offsets and data, doing scalar stores.
> > >
> > > * gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
> > >
> >
> > We are now seeing these failures after this patch was committed:
> > FAIL:  gcc.dg/vect/pr25413a.c -flto -ffat-lto-objects
> scan-tree-dump-times
> > vect "vectorized 2 loops" 1
> > FAIL:  gcc.dg/vect/pr25413a.c scan-tree-dump-times vect "vectorized 2
> > loops" 1
> > on aarch64
>
> Looks like to vectorize the scatter we need a size_t vector
> multiplication.  But the vect_long_mult target includes aarch64.
>
> Is that the actual issue?
>
> With armv8-a+sve it seems to indeed have size_t*size_t multiplication
> and the testcase works.  What architecture level are you testing
> with?  Can we fix check_effective_target_vect_long_mult?
>

Indeed, it seems this is the problem: I'm running on a  Neoverse-N1, which
does not have SVE.

Christophe


> Richard.
>
> > Christophe
> >
> >
> > * gcc.dg/vect/vect-71.c: Likewise.
> > > * gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
> > > * gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
> > > * gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/pr25413a.c  |   3 +-
> > >  .../gcc.dg/vect/tsvc/vect-tsvc-s4113.c|   2 +-
> > >  .../gcc.dg/vect/tsvc/vect-tsvc-s491.c |   2 +-
> > >  .../gcc.dg/vect/tsvc/vect-tsvc-vas.c  |   2 +-
> > >  gcc/testsuite/gcc.dg/vect/vect-71.c   |   2 +-
> > >  gcc/tree-vect-data-refs.cc|   4 +-
> > >  gcc/tree-vect-stmts.cc| 117 ++
> > >  7 files changed, 97 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/pr25413a.c
> > > b/gcc/testsuite/gcc.dg/vect/pr25413a.c
> > > index e444b2c3e8e..ffb517c9ce0 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/pr25413a.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/pr25413a.c
> > > @@ -123,7 +123,6 @@ int main (void)
> > >return 0;
> > >  }
> > >
> > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {
> > > target { ! vect_scatter_store } } } } */
> > > -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {
> > > target vect_scatter_store } } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" }
> } */
> > >  /* { dg-final { scan-tree-dump-times "vector alignment may not be
> > > reachable" 1 "vect" { target { ! vector_alignment_reachable  } } } } */
> > >  /* { dg-final { scan-tree-dump-times "Alignment of access forced using
> > > versioning" 1 "vect" { target { ! vector_alignment_reachable } } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> > > b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> > > index b64682a65df..ddb7e9dc0e8 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> > > @@ -39,4 +39,4 @@ int main (int argc, char **argv)
> > >return 0;
> > >  }
> > >
> > > -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { !
> > > aarch64_sve }  } } } */
> > > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> > > b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> > > index 8465e137070..29e90ff0aff 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> > > @@ -39,4 +39,4 @@ int main (int argc, char **argv)
> > >return 0;
> > >  }
> > >
> > > -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { !
> > > aarch64_sve }  } } } */
> > > +/* { dg-final { scan-tree-dump "v

Re: [PATCH] gimple-range-op: Improve handling of sqrt ranges

2023-05-05 Thread Aldy Hernandez via Gcc-patches





On 5/5/23 10:00, Jakub Jelinek wrote:

Hi!

The previous patch just added basic intrinsic ranges for sqrt
([-0.0, +Inf] +-NAN being the general result range of the function
and [-0.0, +Inf] the general operand range if result isn't NAN etc.),
the following patch intersects those ranges with particular range
computed from argument or result's exact range with the expected
error in ulps taken into account and adds a function (frange_arithmetic
variant) which can be used by other functions as well as helper.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-05-05  Jakub Jelinek  

* value-range.h (frange_arithmetic): Declare.
* range-op-float.cc (frange_arithmetic): No longer static.
* gimple-range-op.cc (frange_mpfr_arg1): New function.
(cfn_sqrt::fold_range): Intersect the generic boundaries range
with range computed from sqrt of the particular bounds.
(cfn_sqrt::op1_range): Intersect the generic boundaries range
with range computed from squared particular bounds.

* gcc.dg/tree-ssa/range-sqrt-2.c: New test.

--- gcc/value-range.h.jj2023-05-04 13:34:45.140336099 +0200
+++ gcc/value-range.h   2023-05-04 16:28:18.286108178 +0200
@@ -1294,5 +1294,8 @@ frange::nan_signbit_p (bool &signbit) co
  
  void frange_nextafter (enum machine_mode, REAL_VALUE_TYPE &,

   const REAL_VALUE_TYPE &);
+void frange_arithmetic (enum tree_code, tree, REAL_VALUE_TYPE &,
+   const REAL_VALUE_TYPE &, const REAL_VALUE_TYPE &,
+   const REAL_VALUE_TYPE &);
  
  #endif // GCC_VALUE_RANGE_H

--- gcc/range-op-float.cc.jj2023-05-04 13:34:45.139336114 +0200
+++ gcc/range-op-float.cc   2023-05-04 16:28:18.285108192 +0200
@@ -305,7 +305,7 @@ frange_nextafter (enum machine_mode mode
  // SF/DFmode (when storing into memory from the 387 stack).  Maybe
  // this is ok as well though it is just occasionally more precise. ??
  
-static void

+void
  frange_arithmetic (enum tree_code code, tree type,
   REAL_VALUE_TYPE &result,
   const REAL_VALUE_TYPE &op1,
--- gcc/gimple-range-op.cc.jj   2023-05-04 13:34:45.139336114 +0200
+++ gcc/gimple-range-op.cc  2023-05-04 19:58:44.842606865 +0200
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
  #include "value-query.h"
  #include "gimple-range.h"
  #include "attr-fnspec.h"
+#include "realmpfr.h"
  
  // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names

  // on the statement.  For efficiency, it is an error to not pass in enough
@@ -403,6 +404,66 @@ public:
}
  } op_cfn_copysign;
  
+/* Compute FUNC (ARG) where FUNC is a mpfr function.  If RES_LOW is non-NULL,

+   set it to low bound of possible range if the function is expected to have
+   ULPS precision and similarly if RES_HIGH is non-NULL, set it to high bound.
+   If the function returns false, the results weren't set.  */
+
+static bool
+frange_mpfr_arg1 (REAL_VALUE_TYPE *res_low, REAL_VALUE_TYPE *res_high,
+ int (*func) (mpfr_ptr, mpfr_srcptr, mpfr_rnd_t),
+ const REAL_VALUE_TYPE &arg, tree type, unsigned ulps)
+{


Since you're returning a range of sorts [low, high], would it be cleaner 
to return an frange, or is always calculating low/high too expensive?  I 
notice you avoid it when passing NULL.


Would you mind adding a typedef for the (*func) callback above?  I 
always find C callbacks a pain to read.


Thanks.
Aldy


+  if (ulps == ~0U || !real_isfinite (&arg))
+return false;
+  machine_mode mode = TYPE_MODE (type);
+  const real_format *format = REAL_MODE_FORMAT (mode);
+  auto_mpfr m (format->p);
+  mpfr_from_real (m, &arg, MPFR_RNDN);
+  mpfr_clear_flags ();
+  bool inexact = func (m, m, MPFR_RNDN);
+  if (!mpfr_number_p (m) || mpfr_overflow_p () || mpfr_underflow_p ())
+return false;
+
+  REAL_VALUE_TYPE value, result;
+  real_from_mpfr (&value, m, format, MPFR_RNDN);
+  if (!real_isfinite (&value))
+return false;
+  if ((value.cl == rvc_zero) != (mpfr_zero_p (m) != 0))
+inexact = true;
+
+  real_convert (&result, format, &value);
+  if (!real_isfinite (&result))
+return false;
+  bool round_low = false;
+  bool round_high = false;
+  if (!ulps && flag_rounding_math)
+++ulps;
+  if (inexact || !real_identical (&result, &value))
+{
+  if (MODE_COMPOSITE_P (mode))
+   round_low = round_high = true;
+  else
+   {
+ round_low = !real_less (&result, &value);
+ round_high = !real_less (&value, &result);
+   }
+}
+  if (res_low)
+{
+  *res_low = result;
+  for (unsigned int i = 0; i < ulps + round_low; ++i)
+   frange_nextafter (mode, *res_low, dconstninf);
+}
+  if (res_high)
+{
+  *res_high = result;
+  for (unsigned int i = 0; i < ulps + round_high; ++i)
+   frange_nextafter (mode, *res_high, dconstinf);
+}
+  return true;
+}
+
  class cfn_sqrt : public range_opera

Re: [PATCH] gimple-range-op: Improve handling of sqrt ranges

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 11:06:31AM +0200, Aldy Hernandez wrote:
> > +/* Compute FUNC (ARG) where FUNC is a mpfr function.  If RES_LOW is 
> > non-NULL,
> > +   set it to low bound of possible range if the function is expected to 
> > have
> > +   ULPS precision and similarly if RES_HIGH is non-NULL, set it to high 
> > bound.
> > +   If the function returns false, the results weren't set.  */
> > +
> > +static bool
> > +frange_mpfr_arg1 (REAL_VALUE_TYPE *res_low, REAL_VALUE_TYPE *res_high,
> > + int (*func) (mpfr_ptr, mpfr_srcptr, mpfr_rnd_t),
> > + const REAL_VALUE_TYPE &arg, tree type, unsigned ulps)
> > +{
> 
> Since you're returning a range of sorts [low, high], would it be cleaner to
> return an frange, or is always calculating low/high too expensive?  I notice
> you avoid it when passing NULL.

The point was that the caller can tell which bound it needs, low, high or
both and we don't waste time calculating ones we don't need (especially with
larger values of ulps).  E.g. for the sqrt case we only need one of them,
but when I thought about the sin/cos case, I'll probably need both and
calling the function twice would mean repeating the even more expensive mpfr
call.

> Would you mind adding a typedef for the (*func) callback above?  I always
> find C callbacks a pain to read.

I can, what I used comes from elsewhere (builtins.cc/fold-const-call.cc
which use it like that).

Jakub

Re: [PATCH] gimple-range-op: Improve handling of sqrt ranges

2023-05-05 Thread Aldy Hernandez via Gcc-patches

On Fri, May 5, 2023 at 11:14 AM Jakub Jelinek  wrote:
>
> On Fri, May 05, 2023 at 11:06:31AM +0200, Aldy Hernandez wrote:
> > > +/* Compute FUNC (ARG) where FUNC is a mpfr function.  If RES_LOW is 
> > > non-NULL,
> > > +   set it to low bound of possible range if the function is expected to 
> > > have
> > > +   ULPS precision and similarly if RES_HIGH is non-NULL, set it to high 
> > > bound.
> > > +   If the function returns false, the results weren't set.  */
> > > +
> > > +static bool
> > > +frange_mpfr_arg1 (REAL_VALUE_TYPE *res_low, REAL_VALUE_TYPE *res_high,
> > > + int (*func) (mpfr_ptr, mpfr_srcptr, mpfr_rnd_t),
> > > + const REAL_VALUE_TYPE &arg, tree type, unsigned ulps)
> > > +{
> >
> > Since you're returning a range of sorts [low, high], would it be cleaner to
> > return an frange, or is always calculating low/high too expensive?  I notice
> > you avoid it when passing NULL.
>
> The point was that the caller can tell which bound it needs, low, high or
> both and we don't waste time calculating ones we don't need (especially with
> larger values of ulps).  E.g. for the sqrt case we only need one of them,
> but when I thought about the sin/cos case, I'll probably need both and
> calling the function twice would mean repeating the even more expensive mpfr
> call.
>
> > Would you mind adding a typedef for the (*func) callback above?  I always
> > find C callbacks a pain to read.
>
> I can, what I used comes from elsewhere (builtins.cc/fold-const-call.cc
> which use it like that).

It would be my preference to have a typedef in
builtins.cc/fold-const-call.cc as well, as we could clean everything
up.  But I defer to you as a global maintainer, whether we want to do
that or not.  If not, then don't bother with just cleaning up
gimple-range-op.cc.

LGTM.

Aldy

[PATCH 5/8] MIPS: Add LUI instruction for mips16e2

2023-05-05 Thread Jie Mei

This patch adds LUI instruction from mips16e2
with corresponding test.

gcc/ChangeLog:

* gcc/config/mips/mips.cc(mips_symbol_insns_1): Generates LUI 
instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same as above.
* gcc/config/mips/mips.md: Same as above

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: Add new tests for mips16e2.
---
 gcc/config/mips/mips.cc  | 44 ++--
 gcc/config/mips/mips.md  |  2 +-
 gcc/testsuite/gcc.target/mips/mips16e2.c | 22 
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index d86911d10c2..0792f89cab4 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2295,7 +2295,9 @@ mips_symbol_insns_1 (enum mips_symbol_type type, 
machine_mode mode)
 The final address is then $at + %lo(symbol).  With 32-bit
 symbols we just need a preparatory LUI for normal mode and
 a preparatory LI and SLL for MIPS16.  */
-  return ABI_HAS_64BIT_SYMBOLS ? 6 : TARGET_MIPS16 ? 3 : 2;
+  return ABI_HAS_64BIT_SYMBOLS 
+ ? 6 
+ : (TARGET_MIPS16 && !ISA_HAS_MIPS16E2) ? 3 : 2;
 
 case SYMBOL_GP_RELATIVE:
   /* Treat GP-relative accesses as taking a single instruction on
@@ -2867,7 +2869,7 @@ mips_const_insns (rtx x)
 
   /* This is simply an LUI for normal mode.  It is an extended
 LI followed by an extended SLL for MIPS16.  */
-  return TARGET_MIPS16 ? 4 : 1;
+  return TARGET_MIPS16 ? (ISA_HAS_MIPS16E2 ? 2 : 4) : 1;
 
 case CONST_INT:
   if (TARGET_MIPS16)
@@ -2879,7 +2881,10 @@ mips_const_insns (rtx x)
: SMALL_OPERAND_UNSIGNED (INTVAL (x)) ? 2
: IN_RANGE (-INTVAL (x), 0, 255) ? 2
: SMALL_OPERAND_UNSIGNED (-INTVAL (x)) ? 3
-   : 0);
+   : ISA_HAS_MIPS16E2
+ ? (trunc_int_for_mode (INTVAL (x), SImode) == INTVAL (x)
+? 4 : 8)
+ : 0);
 
   return mips_build_integer (codes, INTVAL (x));
 
@@ -5252,6 +5257,11 @@ mips_output_move (rtx dest, rtx src)
  if (!TARGET_MIPS16)
return "li\t%0,%1\t\t\t# %X1";
 
+ if (ISA_HAS_MIPS16E2
+ && LUI_INT (src)
+ && !SMALL_OPERAND_UNSIGNED (INTVAL (src)))
+   return "lui\t%0,%%hi(%1)\t\t\t# %X1";
+
  if (SMALL_OPERAND_UNSIGNED (INTVAL (src)))
return "li\t%0,%1";
 
@@ -5260,7 +5270,7 @@ mips_output_move (rtx dest, rtx src)
}
 
   if (src_code == HIGH)
-   return TARGET_MIPS16 ? "#" : "lui\t%0,%h1";
+   return (TARGET_MIPS16 && !ISA_HAS_MIPS16E2) ? "#" : "lui\t%0,%h1";
 
   if (CONST_GP_P (src))
return "move\t%0,%1";
@@ -11983,13 +11993,25 @@ mips_output_function_prologue (FILE *file)
 {
   if (TARGET_MIPS16)
{
- /* This is a fixed-form sequence.  The position of the
-first two instructions is important because of the
-way _gp_disp is defined.  */
- output_asm_insn ("li\t$2,%%hi(_gp_disp)", 0);
- output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
- output_asm_insn ("sll\t$2,16", 0);
- output_asm_insn ("addu\t$2,$3", 0);
+ if (ISA_HAS_MIPS16E2)
+   {
+ /* This is a fixed-form sequence.  The position of the
+first two instructions is important because of the
+way _gp_disp is defined.  */
+ output_asm_insn ("lui\t$2,%%hi(_gp_disp)", 0);
+ output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
+ output_asm_insn ("addu\t$2,$3", 0);
+   }
+ else
+   {
+ /* This is a fixed-form sequence.  The position of the
+first two instructions is important because of the
+way _gp_disp is defined.  */
+ output_asm_insn ("li\t$2,%%hi(_gp_disp)", 0);
+ output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
+ output_asm_insn ("sll\t$2,16", 0);
+ output_asm_insn ("addu\t$2,$3", 0);
+   }
}
   else
{
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 9f652310aa2..73c9acd484f 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -4634,7 +4634,7 @@
 (define_split
   [(set (match_operand:P 0 "d_operand")
(high:P (match_operand:P 1 "symbolic_operand_with_high")))]
-  "TARGET_MIPS16 && reload_completed"
+  "TARGET_MIPS16 && reload_completed && !ISA_HAS_MIPS16E2"
   [(set (match_dup 0) (unspec:P [(match_dup 1)] UNSPEC_UNSHIFTED_HIGH))
(set (match_dup 0) (ashift:P (match_dup 0) (const_int 16)))])
 
diff --git a/gcc/testsuite/gcc.target/mips/mips16e2.c 
b/gcc/testsuite/gcc.target/mips/mips16e2.c
index ce8b4f1819b..780891b4056 100644
--- a/gcc/testsuite/gcc.target/mips/

[PATCH 6/8] MIPS: Add load/store word left/right instructions for mips16e2

2023-05-05 Thread Jie Mei

This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.

gcc/ChangeLog:

* gcc/config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* gcc/config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for 
ISA_HAS_MIPS16E2.
* gcc/config/mips/mips.md(mov_l): Generates instructions.
(mov_r): Same as above.
(mov_l): Adjusted for the conditions above.
(mov_r): Same as above.
(mov_l_mips16e2): Add machine description for `define_insn 
mov_l_mips16e2`.
(mov_r_mips16e2): Add machine description for `define_insn 
mov_r_mips16e2`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc  |  15 ++-
 gcc/config/mips/mips.h   |   2 +-
 gcc/config/mips/mips.md  |  43 +++--
 gcc/testsuite/gcc.target/mips/mips16e2.c | 116 +++
 4 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 0792f89cab4..275efc5a390 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -8603,12 +8603,25 @@ mips_expand_ins_as_unaligned_store (rtx dest, rtx src, 
HOST_WIDE_INT width,
 return false;
 
   mode = int_mode_for_size (width, 0).require ();
-  src = gen_lowpart (mode, src);
+  if (TARGET_MIPS16
+  && src == const0_rtx)
+src = force_reg (mode, src);
+  else
+src = gen_lowpart (mode, src);
+
   if (mode == DImode)
 {
+  if (TARGET_MIPS16)
+   gcc_unreachable ();
   emit_insn (gen_mov_sdl (dest, src, left));
   emit_insn (gen_mov_sdr (copy_rtx (dest), copy_rtx (src), right));
 }
+  else if (TARGET_MIPS16)
+{
+  emit_insn (gen_mov_swl_mips16e2 (dest, src, left));
+  emit_insn (gen_mov_swr_mips16e2 (copy_rtx (dest), copy_rtx (src),
+  right));
+}
   else
 {
   emit_insn (gen_mov_swl (dest, src, left));
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index cab5ff422a8..a5c121088b7 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1180,7 +1180,7 @@ struct mips_cpu_info {
  && (MODE) == V2SFmode))   \
 && !TARGET_MIPS16)
 
-#define ISA_HAS_LWL_LWR(mips_isa_rev <= 5 && !TARGET_MIPS16)
+#define ISA_HAS_LWL_LWR(mips_isa_rev <= 5 && (!TARGET_MIPS16 
|| ISA_HAS_MIPS16E2))
 
 #define ISA_HAS_IEEE_754_LEGACY(mips_isa_rev <= 5)
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 73c9acd484f..5ef8d99d99c 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -4488,10 +4488,12 @@
(unspec:GPR [(match_operand:BLK 1 "memory_operand" "m")
 (match_operand:QI 2 "memory_operand" "ZC")]
UNSPEC_LOAD_LEFT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[1])"
+  "(!TARGET_MIPS16 || ISA_HAS_MIPS16E2)
+&& mips_mem_fits_mode_p (mode, operands[1])"
   "l\t%0,%2"
   [(set_attr "move_type" "load")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
 
 (define_insn "mov_r"
   [(set (match_operand:GPR 0 "register_operand" "=d")
@@ -4499,17 +4501,20 @@
 (match_operand:QI 2 "memory_operand" "ZC")
 (match_operand:GPR 3 "register_operand" "0")]
UNSPEC_LOAD_RIGHT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[1])"
+  "(!TARGET_MIPS16 || ISA_HAS_MIPS16E2)
+&& mips_mem_fits_mode_p (mode, operands[1])"
   "r\t%0,%2"
   [(set_attr "move_type" "load")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
 
 (define_insn "mov_l"
   [(set (match_operand:BLK 0 "memory_operand" "=m")
(unspec:BLK [(match_operand:GPR 1 "reg_or_0_operand" "dJ")
 (match_operand:QI 2 "memory_operand" "ZC")]
UNSPEC_STORE_LEFT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[0])"
+  "!TARGET_MIPS16
+   && mips_mem_fits_mode_p (mode, operands[0])"
   "l\t%z1,%2"
   [(set_attr "move_type" "store")
(set_attr "mode" "")])
@@ -4520,11 +4525,37 @@
 (match_operand:QI 2 "memory_operand" "ZC")
 (match_dup 0)]
UNSPEC_STORE_RIGHT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[0])"
+  "!TARGET_MIPS16
+   && mips_mem_fits_mode_p (mode, operands[0])"
   "r\t%z1,%2"
   [(set_attr "move_type" "store")
(set_attr "mode" "")])
 
+(define_insn "mov_l_mips16e2"
+  [(set (match_operand:BLK 0 "memory_operand" "=m")
+(unspec:BLK [(match_operand:GPR 1 "register_operand" "d")
+ (match_operand:QI 2 "memory_operand" "ZC")]
+UNSPEC_STORE_LEFT))]
+  "TARGET_MIPS16 && ISA_HAS_MIPS16E2
+   && mips_mem_fits_mode_p (mode, operands[0])"
+  "l\t%1,%2"
+

[PATCH 2/8] MIPS: Add MOVx instructions support for mips16e2

2023-05-05 Thread Jie Mei

This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for 
ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov_on_): Add logics for 
MOVx insts.
(*mov_on__mips16e2): Generate MOVx instruction.
(*mov_on__ne): Add logics for MOVx insts.
(*mov_on__ne_mips16e2): Generate MOVx instruction.
* config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate 
for MOVx insts.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.
---
 gcc/config/mips/mips.h|  1 +
 gcc/config/mips/mips.md   | 38 ++-
 gcc/config/mips/predicates.md |  6 ++
 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c | 68 +++
 4 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 8db92c6468f..c396e5ea2f3 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1081,6 +1081,7 @@ struct mips_cpu_info {
ST Loongson 2E/2F.  */
 #define ISA_HAS_CONDMOVE(ISA_HAS_FP_CONDMOVE   \
 || TARGET_MIPS5900 \
+|| ISA_HAS_MIPS16E2\
 || TARGET_LOONGSON_2EF)
 
 /* ISA has LDC1 and SDC1.  */
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ac1d77afc7d..48d5f419ce0 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -7341,26 +7341,60 @@
 (const_int 0)])
 (match_operand:GPR 2 "reg_or_0_operand" "dJ,0")
 (match_operand:GPR 3 "reg_or_0_operand" "0,dJ")))]
-  "ISA_HAS_CONDMOVE"
+  "!TARGET_MIPS16 && ISA_HAS_CONDMOVE"
   "@
 mov%T4\t%0,%z2,%1
 mov%t4\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
+(define_insn "*mov_on__mips16e2"
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+(if_then_else:GPR
+ (match_operator 4 "equality_operator"
+[(match_operand:MOVECC 1 "register_operand" 
",,t,t")
+ (const_int 0)])
+ (match_operand:GPR 2 "reg_or_0_operand_mips16e2" "dJ,0,dJ,0")
+ (match_operand:GPR 3 "reg_or_0_operand_mips16e2" "0,dJ,0,dJ")))]
+  "ISA_HAS_MIPS16E2 && ISA_HAS_CONDMOVE"
+  "@
+mov%T4\t%0,%z2,%1
+mov%t4\t%0,%z3,%1
+movt%T4\t%0,%z2
+movt%t4\t%0,%z3"
+  [(set_attr "type" "condmove")
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*mov_on__ne"
   [(set (match_operand:GPR 0 "register_operand" "=d,d")
(if_then_else:GPR
 (match_operand:GPR2 1 "register_operand" ",")
 (match_operand:GPR 2 "reg_or_0_operand" "dJ,0")
 (match_operand:GPR 3 "reg_or_0_operand" "0,dJ")))]
-  "ISA_HAS_CONDMOVE"
+  "!TARGET_MIPS16 && ISA_HAS_CONDMOVE"
   "@
 movn\t%0,%z2,%1
 movz\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
+(define_insn "*mov_on__ne_mips16e2"
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+   (if_then_else:GPR
+(match_operand:GPR2 1 "register_operand" ",,t,t")
+(match_operand:GPR 2 "reg_or_0_operand_mips16e2" "dJ,0,dJ,0")
+(match_operand:GPR 3 "reg_or_0_operand_mips16e2" "0,dJ,0,dJ")))]
+ "ISA_HAS_MIPS16E2 && ISA_HAS_CONDMOVE"
+  "@
+movn\t%0,%z2,%1
+movz\t%0,%z3,%1
+movtn\t%0,%z2
+movtz\t%0,%z3"
+  [(set_attr "type" "condmove")
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*mov_on_"
   [(set (match_operand:SCALARF 0 "register_operand" "=f,f")
(if_then_else:SCALARF
diff --git a/gcc/config/mips/predicates.md b/gcc/config/mips/predicates.md
index e34de2937cc..9ffaed689a3 100644
--- a/gcc/config/mips/predicates.md
+++ b/gcc/config/mips/predicates.md
@@ -114,6 +114,12 @@
(not (match_test "TARGET_MIPS16")))
(match_operand 0 "register_operand")))
 
+(define_predicate "reg_or_0_operand_mips16e2"
+  (ior (and (match_operand 0 "const_0_operand")
+(ior (not (match_test "TARGET_MIPS16"))
+ (match_test "ISA_HAS_MIPS16E2")))
+   (match_operand 0 "register_operand")))
+
 (define_predicate "const_1_operand"
   (and (match_code "const_int,const_double,const_vector")
(match_test "op == CONST1_RTX (GET_MODE (op))")))
diff --git a/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c 
b/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
new file mode 100644
index 000..6e9dd82ebf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
@@ -0,0 +1,68 @@
+/* { dg-options "-mno-abicalls -mgpopt -G8 -mabi=32 -mips16 -mmips16e2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+/* Test MOVN.  */
+
+/* { dg-final { scan-assembler-times "test01:.*\tmovn\t.*test01\n"

[PATCH 3/8] MIPS: Add instruction about global pointer register for mips16e2

2023-05-05 Thread Jie Mei

The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.

As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is used by some of the instructions in the ASE,
for instance, ADDIU, LB/LBU, etc. .

This patch adds these instructions with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate 
instructions
that uses global pointer register.
(mips16_unextended_reference_p): Same as above.
(mips_pic_base_register): Same as above.
(mips_init_relocs): Same as above.
* gcc/config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro.
(GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`.
* config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above.
(*lowsi_mips16_gp):New `define_insn *low_mips16`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc |  10 +-
 gcc/config/mips/mips.h  |   6 +-
 gcc/config/mips/mips.md |  11 +++
 gcc/testsuite/gcc.target/mips/mips16e2-gp.c | 101 
 4 files changed, 121 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-gp.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 585a3682c7b..be470bbb50d 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2474,6 +2474,9 @@ mips_regno_mode_ok_for_base_p (int regno, machine_mode 
mode,
   if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
 return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
 
+  if (MIPS16_GP_LOADS && regno == GLOBAL_POINTER_REGNUM)
+return (UNITS_PER_WORD > 4 ? GET_MODE_SIZE (mode) <= 4 : true);
+
   return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
 }
 
@@ -2689,7 +2692,8 @@ static bool
 mips16_unextended_reference_p (machine_mode mode, rtx base,
   unsigned HOST_WIDE_INT offset)
 {
-  if (mode != BLKmode && offset % GET_MODE_SIZE (mode) == 0)
+  if (mode != BLKmode && offset % GET_MODE_SIZE (mode) == 0
+  && REGNO (base) != GLOBAL_POINTER_REGNUM)
 {
   if (GET_MODE_SIZE (mode) == 4 && base == stack_pointer_rtx)
return offset < 256U * GET_MODE_SIZE (mode);
@@ -3249,7 +3253,7 @@ mips16_gp_pseudo_reg (void)
 rtx
 mips_pic_base_register (rtx temp)
 {
-  if (!TARGET_MIPS16)
+  if (MIPS16_GP_LOADS ||!TARGET_MIPS16)
 return pic_offset_table_rtx;
 
   if (currently_expanding_to_rtl)
@@ -8756,7 +8760,7 @@ mips_init_relocs (void)
}
 }
 
-  if (TARGET_MIPS16)
+  if (!MIPS16_GP_LOADS && TARGET_MIPS16)
 {
   /* The high part is provided by a pseudo copy of $gp.  */
   mips_split_p[SYMBOL_GP_RELATIVE] = true;
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index c396e5ea2f3..8a6e43407c5 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1375,6 +1375,8 @@ struct mips_cpu_info {
 /* ISA includes the pop instruction.  */
 #define ISA_HAS_POP(TARGET_OCTEON && !TARGET_MIPS16)
 
+#define MIPS16_GP_LOADS(ISA_HAS_MIPS16E2 && !TARGET_64BIT)
+
 /* The CACHE instruction is available in non-MIPS16 code.  */
 #define TARGET_CACHE_BUILTIN (mips_isa >= MIPS_ISA_MIPS3)
 
@@ -2067,10 +2069,6 @@ FP_ASM_SPEC "\
function address than to call an address kept in a register.  */
 #define NO_FUNCTION_CSE 1
 
-/* The ABI-defined global pointer.  Sometimes we use a different
-   register in leaf functions: see PIC_OFFSET_TABLE_REGNUM.  */
-#define GLOBAL_POINTER_REGNUM (GP_REG_FIRST + 28)
-
 /* We normally use $28 as the global pointer.  However, when generating
n32/64 PIC, it is better for leaf functions to use a call-clobbered
register instead.  They can then avoid saving and restoring $28
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 48d5f419ce0..9de5013aad1 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -167,6 +167,7 @@
(GET_FCSR_REGNUM2)
(SET_FCSR_REGNUM4)
(PIC_FUNCTION_ADDR_REGNUM   25)
+   (GLOBAL_POINTER_REGNUM  28)
(RETURN_ADDR_REGNUM 31)
(CPRESTORE_SLOT_REGNUM  76)
(GOT_VERSION_REGNUM 79)
@@ -4678,6 +4679,16 @@
   [(set_attr "alu_type" "add")
(set_attr "mode" "")])
 
+(define_insn "*lowsi_mips16_gp"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+(lo_sum:SI (reg:SI GLOBAL_POINTER_REGNUM)
+  (match_operand 1 "immediate_operand" "")))]
+  "MIPS16_GP_LOADS"
+  "addiu\t%0,$28,%R1"
+  [(set_attr "alu_type" "add")
+   (set_attr "mode" "SI")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*low_mips16"
   [(set (match_operand:P 0 "register_operand" "=d")
(lo_sum:P (match_operand:P 1 "register_operand" "0")

[PATCH 0/8] MIPS: Add MIPS16e2 ASE instrucions.

2023-05-05 Thread Jie Mei

The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.

This series of patches adds all instructions of MIPS16E2 ASE. 

Jie Mei (8):
  MIPS: Add basic support for mips16e2
  MIPS: Add MOVx instructions support for mips16e2
  MIPS: Add instruction about global pointer register for mips16e2
  MIPS: Add bitwise instructions for mips16e2
  MIPS: Add LUI instruction for mips16e2
  MIPS: Add load/store word left/right instructions for mips16e2
  MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2
  MIPS: Add CACHE instruction for mips16e2

 gcc/config/mips/constraints.md|   4 +
 gcc/config/mips/mips-protos.h |   4 +
 gcc/config/mips/mips.cc   | 164 ++--
 gcc/config/mips/mips.h|  32 ++-
 gcc/config/mips/mips.md   | 188 --
 gcc/config/mips/mips.opt  |   4 +
 gcc/config/mips/predicates.md |  19 +-
 gcc/doc/invoke.texi   |   7 +
 gcc/testsuite/gcc.target/mips/mips.exp|  10 +
 .../gcc.target/mips/mips16e2-cache.c  |  34 +++
 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c |  68 +
 gcc/testsuite/gcc.target/mips/mips16e2-gp.c   | 101 
 gcc/testsuite/gcc.target/mips/mips16e2.c  | 240 ++
 13 files changed, 816 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cache.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-gp.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2.c

-- 
2.40.1

[PATCH 1/8] MIPS: Add basic support for mips16e2

2023-05-05 Thread Jie Mei

The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).

This patch adds basic support for mips16e2 used by the
following series of patches.

gcc/ChangeLog:

* config/mips/mips.cc(mips_file_start): Add mips16e2 info
for output file.
* config/mips/mips.h(__mips_mips16e2): Defined a new
predefine macro.
(ISA_HAS_MIPS16E2): Defined a new macro.
(ASM_SPEC): Pass mmips16e2 to the assembler.
* config/mips/mips.opt: Add -m(no-)mips16e2 option.
* doc/invoke.texi: Same as above.

gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2
option.
(mips-dg-init): Handle the recognization of mips16e2 targets.
(mips-dg-options): Add dependencies for mips16e2.
---
 gcc/config/mips/mips.cc|  3 ++-
 gcc/config/mips/mips.h |  8 
 gcc/config/mips/mips.opt   |  4 
 gcc/doc/invoke.texi|  7 +++
 gcc/testsuite/gcc.target/mips/mips.exp | 10 ++
 5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..585a3682c7b 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -10047,7 +10047,8 @@ mips_file_start (void)
 fputs ("\t.module\tmsa\n", asm_out_file);
   if (TARGET_XPA)
 fputs ("\t.module\txpa\n", asm_out_file);
-  /* FIXME: MIPS16E2 is not supported by GCC? gas does support it */
+  if (TARGET_MIPS16E2)
+fputs ("\t.module\tmips16e2\n", asm_out_file);
   if (TARGET_CRC)
 fputs ("\t.module\tcrc\n", asm_out_file);
   if (TARGET_GINV)
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 6daf6d37165..8db92c6468f 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -475,6 +475,9 @@ struct mips_cpu_info {
   if (mips_base_compression_flags & MASK_MIPS16)   \
builtin_define ("__mips16");\
\
+  if (TARGET_MIPS16E2) \
+   builtin_define ("__mips_mips16e2"); \
+   \
   if (TARGET_MIPS3D)   \
builtin_define ("__mips3d");\
\
@@ -1291,6 +1294,10 @@ struct mips_cpu_info {
 /* The MSA ASE is available.  */
 #define ISA_HAS_MSA(TARGET_MSA && !TARGET_MIPS16)
 
+/* The MIPS16e V2 instructions are available.  */
+#define ISA_HAS_MIPS16E2   (TARGET_MIPS16 && TARGET_MIPS16E2 \
+   && !TARGET_64BIT)
+
 /* True if the result of a load is not available to the next instruction.
A nop will then be needed between instructions like "lw $4,..."
and "addiu $4,$4,1".  */
@@ -1401,6 +1408,7 @@ struct mips_cpu_info {
 
 #ifdef HAVE_AS_DOT_MODULE
 #define FP_ASM_SPEC "\
+%{mmips16e2} \
 %{mhard-float} %{msoft-float} \
 %{msingle-float} %{mdouble-float}"
 #else
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 195f5be01cc..4968ed0d544 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -380,6 +380,10 @@ msplit-addresses
 Target Mask(SPLIT_ADDRESSES)
 Optimize lui/addiu address loads.
 
+mmips16e2
+Target Var(TARGET_MIPS16E2) Init(0)
+Enable the MIPS16e V2 instructions.
+
 msym32
 Target Var(TARGET_SYM32)
 Assume all symbols have 32-bit values.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..0b1cef7c330 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26709,6 +26709,13 @@ MIPS16 code generation can also be controlled on a 
per-function basis
 by means of @code{mips16} and @code{nomips16} attributes.
 @xref{Function Attributes}, for more information.
 
+@opindex mmips16e2
+@opindex mno-mips16e2
+@item -mmips16e2
+@itemx -mno-mips16e2
+Use (do not use) the MIPS16e2 ASE.  This option modifies the behavior
+of the @option{-mips16} option such that it targets the MIPS16e2 ASE@.
+
 @opindex mflip-mips16
 @item -mflip-mips16
 Generate MIPS16 code on alternating functions.  This option is provided
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 15d574202d3..e79f685ceb0 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -301,6 +301,7 @@ foreach option {
 loongson-mmi
 loongson-ext
 loongson-ext2
+mips16e2
 } {
 lappend mips_option_groups $option "-m(no-|)$option"
 }
@@ -821,6 +822,12 @@ proc mips-dg-init {} {
"-mno-mips16",
#endif
 
+   #ifdef

[PATCH 4/8] MIPS: Add bitwise instructions for mips16e2

2023-05-05 Thread Jie Mei

There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .

This patch adds these instrutions with corresponding tests.

gcc/ChangeLog:

* gcc/config/mips/constraints.md(Yz): New constraints for mips16e2.
* gcc/config/mips/mips-protos.h(mips_bit_clear_p): Declared new 
function.
(mips_bit_clear_info): Same as above.
* gcc/config/mips/mips.cc(mips_bit_clear_info): New function for
generating instructions.
(mips_bit_clear_p): Same as above.
* gcc/config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for 
ISA_HAS_MIPS16E2.
* gcc/config/mips/mips.md(extended_mips16): Generates EXT and INS 
instructions.
(*and3): Generates INS instruction.
(*and3_mips16): Generates EXT, INS and ANDI instructions.
(ior3): Add logics for ORI instruction.
(*ior3_mips16_asmacro): Generates ORI instrucion.
(*ior3_mips16): Add logics for XORI instruction.
(*xor3_mips16): Generates XORI instrucion.
(*extzv): Add logics for EXT instruction.
(*insv): Add logics for INS instruction.
* gcc/config/mips/predicates.md(bit_clear_operand): New predicate for
generating bitwise instructions.
(and_reg_operand): Add logics for generating bitwise instructions.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.
---
 gcc/config/mips/constraints.md   |   4 +
 gcc/config/mips/mips-protos.h|   4 +
 gcc/config/mips/mips.cc  |  67 ++-
 gcc/config/mips/mips.h   |   3 +-
 gcc/config/mips/mips.md  |  91 
 gcc/config/mips/predicates.md|  13 ++-
 gcc/testsuite/gcc.target/mips/mips16e2.c | 102 +++
 7 files changed, 263 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2.c

diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
index 49d1a43c613..22d4d84f074 100644
--- a/gcc/config/mips/constraints.md
+++ b/gcc/config/mips/constraints.md
@@ -264,6 +264,10 @@
   (and (match_code "const_vector")
(match_test "op == CONST0_RTX (mode)")))
 
+(define_constraint "Yz"
+  "@internal"
+  (match_operand 0 "bit_clear_operand"))
+
 (define_constraint "YA"
   "@internal
An unsigned 6-bit constant."
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 20483469105..2791b9f220a 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -388,4 +388,8 @@ extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void mips_expand_vec_cmp_expr (rtx *);
 
+extern bool mips_bit_clear_p (enum machine_mode, unsigned HOST_WIDE_INT);
+extern void mips_bit_clear_info (enum machine_mode, unsigned HOST_WIDE_INT,
+ int *, int *);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index be470bbb50d..d86911d10c2 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3895,6 +3895,10 @@ mips16_constant_cost (int code, HOST_WIDE_INT x)
return 0;
   return -1;
 
+case ZERO_EXTRACT:
+  /* The bit position and size are immediate operands.  */
+  return ISA_HAS_EXT_INS ? COSTS_N_INSNS (1) : -1;
+
 default:
   return -1;
 }
@@ -22753,7 +22757,68 @@ mips_asm_file_end (void)
   if (NEED_INDICATE_EXEC_STACK)
 file_end_indicate_exec_stack ();
 }
-
+
+void
+mips_bit_clear_info (enum machine_mode mode, unsigned HOST_WIDE_INT m,
+ int *start_pos, int *size)
+{
+  unsigned int shift = 0;
+  unsigned int change_count = 0;
+  unsigned int prev_val = 1;
+  unsigned int curr_val = 0;
+  unsigned int end_pos = GET_MODE_SIZE (mode) * BITS_PER_UNIT;
+
+  for (shift = 0 ; shift < (GET_MODE_SIZE (mode) * BITS_PER_UNIT) ; shift++)
+{
+  curr_val = (unsigned int)((m & (unsigned int)(1 << shift)) >> shift);
+  if (curr_val != prev_val)
+{
+  change_count++;
+  switch (change_count)
+{
+  case 1:
+*start_pos = shift;
+break;
+  case 2:
+end_pos = shift;
+break;
+  default:
+gcc_unreachable ();
+}
+}
+  prev_val = curr_val;
+   }
+  *size = (end_pos - *start_pos);
+}
+
+bool
+mips_bit_clear_p (enum machine_mode mode, unsigned HOST_WIDE_INT m)
+{
+  unsigned int shift = 0;
+  unsigned int change_count = 0;
+  unsigned int prev_val = 1;
+  unsigned int curr_val = 0;
+
+  if (mode != SImode && mode != VOIDmode)
+return false;
+
+  if (!ISA_HAS_EXT_INS)
+return false;
+
+  for (shift = 0 ; shift < (UNITS_PER_WORD * BITS_PER_UNIT) ; shift++)
+{
+  curr_val = (unsigned int)((m & (unsigned int)(1 << shift)) >> shift);
+

[PATCH 7/8] MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2

2023-05-05 Thread Jie Mei

The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.

gcc/ChangeLog:

* gcc/config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(ISA_HAS_LL_SC): Same as above.
---
 gcc/config/mips/mips.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index a5c121088b7..1947be25aca 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1247,7 +1247,8 @@ struct mips_cpu_info {
 && !TARGET_MIPS16)
 
 /* ISA has data prefetch, LL and SC with limited 9-bit displacement.  */
-#define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6)
+#define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6  \
+|| ISA_HAS_MIPS16E2)
 
 /* ISA has data indexed prefetch instructions.  This controls use of
'prefx', along with TARGET_HARD_FLOAT and TARGET_DOUBLE_FLOAT.
@@ -1340,7 +1341,8 @@ struct mips_cpu_info {
 #define ISA_HAS_SYNCI (mips_isa_rev >= 2 && !TARGET_MIPS16)
 
 /* ISA includes sync.  */
-#define ISA_HAS_SYNC ((mips_isa >= MIPS_ISA_MIPS2 || TARGET_MIPS3900) && 
!TARGET_MIPS16)
+#define ISA_HAS_SYNC ((mips_isa >= MIPS_ISA_MIPS2 || TARGET_MIPS3900)  \
+ && (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 #define GENERATE_SYNC  \
   (target_flags_explicit & MASK_LLSC   \
? TARGET_LLSC && !TARGET_MIPS16 \
@@ -1349,7 +1351,8 @@ struct mips_cpu_info {
 /* ISA includes ll and sc.  Note that this implies ISA_HAS_SYNC
because the expanders use both ISA_HAS_SYNC and ISA_HAS_LL_SC
instructions.  */
-#define ISA_HAS_LL_SC (mips_isa >= MIPS_ISA_MIPS2 && !TARGET_MIPS5900 && 
!TARGET_MIPS16)
+#define ISA_HAS_LL_SC (mips_isa >= MIPS_ISA_MIPS2 && !TARGET_MIPS5900  \
+  && (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 #define GENERATE_LL_SC \
   (target_flags_explicit & MASK_LLSC   \
? TARGET_LLSC && !TARGET_MIPS16 \
-- 
2.40.1

[PATCH 8/8] MIPS: Add CACHE instruction for mips16e2

2023-05-05 Thread Jie Mei

This patch adds CACHE instruction from mips16e2
with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.c(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above.
(AVAIL_NON_MIPS16 (cache..)): Update to
AVAIL_MIPS16E2_OR_NON_MIPS16.
* config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md (mips_cache): Mark as extended MIPS16.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc   | 25 --
 gcc/config/mips/mips.h|  3 +-
 gcc/config/mips/mips.md   |  3 +-
 .../gcc.target/mips/mips16e2-cache.c  | 34 +++
 4 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cache.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 275efc5a390..e6f4701ad3a 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2845,6 +2845,9 @@ mips_9bit_offset_address_p (rtx x, machine_mode mode)
   return (mips_classify_address (&addr, x, mode, false)
  && addr.type == ADDRESS_REG
  && CONST_INT_P (addr.offset)
+ && (!TARGET_MIPS16E2
+ || M16_REG_P (REGNO (addr.reg))
+ || REGNO (addr.reg) >= FIRST_PSEUDO_REGISTER)
  && MIPS_9BIT_OFFSET_P (INTVAL (addr.offset)));
 }
 
@@ -15412,9 +15415,13 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
The function is available on the current target if !TARGET_MIPS16.
 
BUILTIN_AVAIL_MIPS16
-   The function is available on the current target if TARGET_MIPS16.  */
+   The function is available on the current target if TARGET_MIPS16.
+
+   BUILTIN_AVAIL_MIPS16E2
+   The function is available on the current target if TARGET_MIPS16E2.  */
 #define BUILTIN_AVAIL_NON_MIPS16 1
 #define BUILTIN_AVAIL_MIPS16 2
+#define BUILTIN_AVAIL_MIPS16E2 4
 
 /* Declare an availability predicate for built-in functions that
require non-MIPS16 mode and also require COND to be true.
@@ -15426,6 +15433,17 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
return (COND) ? BUILTIN_AVAIL_NON_MIPS16 : 0;   \
  }
 
+/* Declare an availability predicate for built-in functions that
+   require non-MIPS16 mode or MIPS16E2 and also require COND to be true.
+   NAME is the main part of the predicate's name.  */
+#define AVAIL_MIPS16E2_OR_NON_MIPS16(NAME, COND)   \
+ static unsigned int   \
+ mips_builtin_avail_##NAME (void)  \
+ { \
+   return ((COND) ? BUILTIN_AVAIL_NON_MIPS16 | BUILTIN_AVAIL_MIPS16E2  \
+  : 0);\
+ }
+
 /* Declare an availability predicate for built-in functions that
support both MIPS16 and non-MIPS16 code and also require COND
to be true.  NAME is the main part of the predicate's name.  */
@@ -15471,7 +15489,7 @@ AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dsp_64, TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
 AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI)
-AVAIL_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
+AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
 AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 
 /* Construct a mips_builtin_description from the given arguments.
@@ -17471,7 +17489,8 @@ mips_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   d = &mips_builtins[fcode];
   avail = d->avail ();
   gcc_assert (avail != 0);
-  if (TARGET_MIPS16 && !(avail & BUILTIN_AVAIL_MIPS16))
+  if (TARGET_MIPS16 && !(avail & BUILTIN_AVAIL_MIPS16)
+  && (!TARGET_MIPS16E2 || !(avail & BUILTIN_AVAIL_MIPS16E2)))
 {
   error ("built-in function %qE not supported for MIPS16",
 DECL_NAME (fndecl));
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 1947be25aca..207b8871b12 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1385,7 +1385,8 @@ struct mips_cpu_info {
 #define TARGET_CACHE_BUILTIN (mips_isa >= MIPS_ISA_MIPS3)
 
 /* The CACHE instruction is available.  */
-#define ISA_HAS_CACHE (TARGET_CACHE_BUILTIN && !TARGET_MIPS16)
+#define ISA_HAS_CACHE (TARGET_CACHE_BUILTIN && (!TARGET_MIPS16 \
+   || TARGET_MIPS16E2))
 
 /* Tell collect what flags to pass to nm.  */
 #ifndef NM_FLAGS
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 5ef8d99d99c..7eb65891820 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5751,7 +5751,8 @@
 (match_operand:QI 1 "address_op

Re: [libstdc++] use strtold for from_chars even without locale

2023-05-05 Thread Florian Weimer via Gcc-patches

* Jonathan Wakely via Libstdc:

> We could use strtod for a single-threaded target (i.e.
> !defined(_GLIBCXX_HAS_GTHREADS) by changing the global locale using
> setlocale, instead of changing the per-thread locale using uselocale.

This is not generally safe because the call to setlocale is still
observable to applications in principle because a previous pointer
returned from setlocale they have store could be invalidated.

Thanks,
Florian

Re: [libstdc++] use strtold for from_chars even without locale

2023-05-05 Thread Jonathan Wakely via Gcc-patches

On Fri, 5 May 2023 at 10:43, Florian Weimer wrote:

> * Jonathan Wakely via Libstdc:
>
> > We could use strtod for a single-threaded target (i.e.
> > !defined(_GLIBCXX_HAS_GTHREADS) by changing the global locale using
> > setlocale, instead of changing the per-thread locale using uselocale.
>
> This is not generally safe because the call to setlocale is still
> observable to applications in principle because a previous pointer
> returned from setlocale they have store could be invalidated.
>
>
Ah yes, good point, thanks. I think that's a non-starter then. I still
think using RAII makes the from_chars_impl function easier to read, so
here's a version of that patch without the single-threaded conditions.
commit 4dc5b8864ec527e699d35880fbc706157113f92b
Author: Jonathan Wakely 
Date:   Thu May 4 15:22:07 2023

libstdc++: Use RAII types in strtod-based std::from_chars implementation

This adds auto_locale and auto_ferounding types to use RAII for changing
and restoring the local and floating-point environment when using strtod
to implement std::from_chars.

The destructors for the RAII objects run slightly later than the
previous statements that restored the locale/fenv, but the differences
are just some trivial assignments and an isinf call.

libstdc++-v3/ChangeLog:

* src/c++17/floating_from_chars.cc [USE_STRTOD_FOR_FROM_CHARS]
(auto_locale, auto_ferounding): New class types.
(from_chars_impl): Use auto_locale and auto_ferounding.

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index 78b9d92cdc0..7b3bdf445e3 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -597,6 +597,69 @@ namespace
 return buf.c_str();
   }
 
+  // RAII type to change and restore the locale.
+  struct auto_locale
+  {
+#if _GLIBCXX_HAVE_USELOCALE
+// When we have uselocale we can change the current thread's locale.
+locale_t loc;
+locale_t orig;
+
+auto_locale()
+: loc(::newlocale(LC_ALL_MASK, "C", (locale_t)0))
+{
+  if (loc)
+   orig = ::uselocale(loc);
+  else
+   ec = errc{errno};
+}
+
+~auto_locale()
+{
+  if (loc)
+   {
+ ::uselocale(orig);
+ ::freelocale(loc);
+   }
+}
+#else
+// Otherwise, we can't change the locale and so strtod can't be used.
+auto_locale() = delete;
+#endif
+
+explicit operator bool() const noexcept { return ec == errc{}; }
+
+errc ec{};
+
+auto_locale(const auto_locale&) = delete;
+auto_locale& operator=(const auto_locale&) = delete;
+  };
+
+  // RAII type to change and restore the floating-point environment.
+  struct auto_ferounding
+  {
+#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
+const int rounding = std::fegetround();
+
+auto_ferounding()
+{
+  if (rounding != FE_TONEAREST)
+   std::fesetround(FE_TONEAREST);
+}
+
+~auto_ferounding()
+{
+  if (rounding != FE_TONEAREST)
+   std::fesetround(rounding);
+}
+#else
+auto_ferounding() = default;
+#endif
+
+auto_ferounding(const auto_ferounding&) = delete;
+auto_ferounding& operator=(const auto_ferounding&) = delete;
+  };
+
   // Convert the NTBS `str` to a floating-point value of type `T`.
   // If `str` cannot be converted, `value` is unchanged and `0` is returned.
   // Otherwise, let N be the number of characters consumed from `str`.
@@ -607,16 +670,11 @@ namespace
   ptrdiff_t
   from_chars_impl(const char* str, T& value, errc& ec) noexcept
   {
-if (locale_t loc = ::newlocale(LC_ALL_MASK, "C", (locale_t)0)) [[likely]]
+auto_locale loc;
+
+if (loc)
   {
-   locale_t orig = ::uselocale(loc);
-
-#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
-   const int rounding = std::fegetround();
-   if (rounding != FE_TONEAREST)
- std::fesetround(FE_TONEAREST);
-#endif
-
+   auto_ferounding rounding;
const int save_errno = errno;
errno = 0;
char* endptr;
@@ -647,14 +705,6 @@ namespace
 #endif
const int conv_errno = std::__exchange(errno, save_errno);
 
-#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
-   if (rounding != FE_TONEAREST)
- std::fesetround(rounding);
-#endif
-
-   ::uselocale(orig);
-   ::freelocale(loc);
-
const ptrdiff_t n = endptr - str;
if (conv_errno == ERANGE) [[unlikely]]
  {
@@ -675,8 +725,8 @@ namespace
  }
return n;
   }
-else if (errno == ENOMEM)
-  ec = errc::not_enough_memory;
+else
+  ec = loc.ec;
 
 return 0;
   }

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 11:04:09AM +0200, Jakub Jelinek via Gcc-patches wrote:
> As mentioned in the PR, save_expr seems to be very optimistic when
> some expression is invariant, which can result in various wrong-code
> issues.
> The problem is with the TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t)
> case in tree_invariant_p_1.  TREE_READONLY (t) in that case says
> that the object shouldn't be modified during its lifetime and
> !TREE_SIDE_EFFECTS (t) that it can be evaluated safely multiple times,
> but that doesn't mean we can avoid wrapping the expression into SAVE_EXPR
> say for a TREE_READONLY COMPONENT_REF with INDIRECT_REF as first operand
> - either the lifetime of the TREE_READONLY object could end earlier than
> when we need to reevaluate the object (that happens in the
> pr52339-1.c case where save_expr is called on p->a and then free (p) is
> done or pr52339.C where delete a->b when calling ~B () dtor deallocates a),
> or e.g. the pointer could change as in pr52339-2.c (so evaluating p->a again
> after ++p yields a possibly different value than originally and again we need
> a SAVE_EXPR).
> 
> Attached are two patches which fix this, unfortunately both regress
> FAIL: gnat.dg/loop_optimization21.adb scan-tree-dump-times optimized 
> "Index_Check" 1
> FAIL: gnat.dg/vect1.adb scan-tree-dump-times vect "vectorized 1 loops" 15
> FAIL: gnat.dg/vect2.adb scan-tree-dump-times vect "vectorized 1 loops" 15
> FAIL: gnat.dg/vect3.adb scan-tree-dump-times vect "vectorized 1 loops" 15
> FAIL: gnat.dg/vect4.adb scan-tree-dump-times vect "vectorized 1 loops" 15
> FAIL: gnat.dg/vect5.adb scan-tree-dump-times vect "vectorized 1 loops" 15
> FAIL: gnat.dg/vect6.adb scan-tree-dump-times vect "vectorized 1 loops" 15
> on x86_64-linux (the first scan triggers 2 times rather than once,
> the next 3 13 times rather than 15 and the last 3 14 times rather than 15
> times).
> The first patch has been otherwise successfully bootstrapped/regtested on
> x86_64-linux and i686-linux (with that above regressions), the second one
> is probably better but has been so far tested just on the new testcases and
> verified to also cause the above Ada regressions.

Looking at the Ada cases (I admit I don't really understand why it isn't
vectorized, the IL is so different from the start because of the extra
SAVE_EXPRs that it is very hard to diff stuff), the case where save_expr
used to return the argument and no longer does are those
r.P_BOUNDS->LB0
etc. cases.  Now, I wondered if (pre-gimplification) we couldn't make an
exception and allow the base to be INDIRECT_REF or of a REFERENCE_TYPE
with the idea that references are really imutable and can't be changed
during its lifetime (after gimplification whether something is
REFERENCE_TYPE or POINTER_TYPE is lost), but that isn't what Ada is using.

So, another possibility would be to allow bases of TREE_READONLY (t) &&
!TREE_SIDE_EFFECTS (t) which are INDIRECT_REFs of tree_invariant_p_1
addresses.  That doesn't work either, in the r.P_BOUNDS->LB0 case
P_BOUNDS is a FIELD_DECL with POINTER_TYPE, LB0 is TREE_READONLY FIELD_DECL
and that COMPONENT_REF is  also TREE_READONLY, r is TREE_READONLY PARM_DECL,
but unforuntately the r.P_BOUNDS COMPONENT_REF isn't marked TREE_READONLY.

Thus, shall we treat as tree_invariant_p_1 also handled components which
are !TREE_SIDE_EFFECTS (t), but not TREE_READONLY and only their base
is TREE_READONLY?  Or do that only during the recursion?

Jakub

RE: [PATCH 01/23] arm: [MVE intrinsics] add binary_round_lshift shape

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 01/23] arm: [MVE intrinsics] add binary_round_lshift shape
> 
> This patch adds the binary_round_lshift shape description.
> 

Ok.
I expect the series to be mostly okay given that it follows the schemes 
introduced in the previous series, but I'll review each patch individually 
anyway to make sure.
Thanks again for working on this.
Kyrill

> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_round_lshift): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_round_lshift): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 61 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 62 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 5e6681c784a..28a2d66ddd1 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -458,6 +458,67 @@ struct binary_orrq_def : public overloaded_base<0>
>  };
>  SHAPE (binary_orrq)
> 
> +/* _t vfoo[t0](_t, _t)
> +   _t vfoo[_n_t0](_t, int32_t)
> +
> +   Shape for rounding shift left operations.
> +
> +   Example: vrshlq.
> +   int8x16_t [__arm_]vrshlq[_n_s8](int8x16_t a, int32_t b)
> +   int8x16_t [__arm_]vrshlq_m_n[_s8](int8x16_t a, int32_t b, mve_pred16_t
> p)
> +   int8x16_t [__arm_]vrshlq[_s8](int8x16_t a, int8x16_t b)
> +   int8x16_t [__arm_]vrshlq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vrshlq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
> */
> +struct binary_round_lshift_def : public overloaded_base<0>
> +{
> +  bool
> +  explicit_mode_suffix_p (enum predication_index pred, enum
> mode_suffix_index mode) const override
> +  {
> +return ((mode == MODE_n)
> + && (pred == PRED_m));
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index pred, enum mode_suffix_index
> mode) const override
> +  {
> +switch (mode)
> +  {
> +  case MODE_none:
> + return false;
> +
> + /* For MODE_n, share the overloaded instance with MODE_none,
> except for PRED_m.  */
> +  case MODE_n:
> + return pred != PRED_m;
> +
> +  default:
> + gcc_unreachable ();
> +  }
> +  }
> +
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +build_all (b, "v0,v0,vs0", group, MODE_none, preserve_user_namespace);
> +build_all (b, "v0,v0,ss32", group, MODE_n, preserve_user_namespace,
> false, preds_m_or_none);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (2, i, nargs)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
> +  return error_mark_node;
> +
> +return r.finish_opt_n_resolution (i, 0, type, TYPE_signed);
> +  }
> +};
> +SHAPE (binary_round_lshift)
> +
>  /* xN_t vfoo[_t0](uint64_t, uint64_t)
> 
> where there are N arguments in total.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 3305d12877a..cef081aa8ec 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -37,6 +37,7 @@ namespace arm_mve
>  extern const function_shape *const binary;
>  extern const function_shape *const binary_opt_n;
>  extern const function_shape *const binary_orrq;
> +extern const function_shape *const binary_round_lshift;
>  extern const function_shape *const create;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary_convert;
> --
> 2.34.1

RE: [PATCH 02/23] arm: [MVE intrinsics] factorize vqrshlq vrshlq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 02/23] arm: [MVE intrinsics] factorize vqrshlq vrshlq
> 
> Factorize vqrshlq, vrshlq so that they use the same pattern.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MVE_RSHIFT_M_N, MVE_RSHIFT_N): New.
>   (mve_insn): Add vqrshl, vrshl.
>   * config/arm/mve.md (mve_vqrshlq_n_)
>   (mve_vrshlq_n_): Merge into ...
>   (@mve_q_n_): ... this.
>   (mve_vqrshlq_m_n_,
> mve_vrshlq_m_n_): Merge
>   into ...
>   (@mve_q_m_n_): ... this.
> ---
>  gcc/config/arm/iterators.md | 14 +++
>  gcc/config/arm/mve.md   | 49 -
>  2 files changed, 24 insertions(+), 39 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 593be83e0be..e7622fe752a 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -435,6 +435,16 @@ (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
>VORRQ_N_S VORRQ_N_U
>])
> 
> +(define_int_iterator MVE_RSHIFT_M_N   [
> +  VQRSHLQ_M_N_S VQRSHLQ_M_N_U
> +  VRSHLQ_M_N_S VRSHLQ_M_N_U
> +  ])
> +
> +(define_int_iterator MVE_RSHIFT_N   [
> +  VQRSHLQ_N_S VQRSHLQ_N_U
> +  VRSHLQ_N_S VRSHLQ_N_U
> +  ])
> +
>  (define_int_iterator MVE_FP_M_BINARY   [
>VADDQ_M_F
>VMULQ_M_F
> @@ -526,7 +536,9 @@ (define_int_attr mve_insn [
>(VQRDMULHQ_M_S "vqrdmulh")
>(VQRDMULHQ_N_S "vqrdmulh")
>(VQRDMULHQ_S "vqrdmulh")
> +  (VQRSHLQ_M_N_S "vqrshl") (VQRSHLQ_M_N_U "vqrshl")
>(VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
> +  (VQRSHLQ_N_S "vqrshl") (VQRSHLQ_N_U "vqrshl")
>(VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
>(VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
>(VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
> @@ -538,7 +550,9 @@ (define_int_attr mve_insn [
>(VRHADDQ_S "vrhadd") (VRHADDQ_U "vrhadd")
>(VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
>(VRMULHQ_S "vrmulh") (VRMULHQ_U "vrmulh")
> +  (VRSHLQ_M_N_S "vrshl") (VRSHLQ_M_N_U "vrshl")
>(VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
> +  (VRSHLQ_N_S "vrshl") (VRSHLQ_N_U "vrshl")
>(VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
>(VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
>(VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 6b88fdb8a7a..0d3343b6e29 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1373,17 +1373,18 @@ (define_expand "mve_vorrq_u"
>  )
> 
>  ;;
> -;; [vqrshlq_n_s, vqrshlq_n_u])
> +;; [vqrshlq_n_s, vqrshlq_n_u]
> +;; [vrshlq_n_u, vrshlq_n_s]
>  ;;
> -(define_insn "mve_vqrshlq_n_"
> +(define_insn "@mve_q_n_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  (match_operand:SI 2 "s_register_operand" "r")]
> -  VQRSHLQ_N))
> +  MVE_RSHIFT_N))
>]
>"TARGET_HAVE_MVE"
> -  "vqrshl.%#\t%q0, %2"
> +  ".%#\t%q0, %2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -1432,21 +1433,6 @@ (define_insn "mve_vqshluq_n_s"
>[(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vrshlq_n_u, vrshlq_n_s])
> -;;
> -(define_insn "mve_vrshlq_n_"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -(match_operand:SI 2 "s_register_operand" "r")]
> -  VRSHLQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vrshl.%#\t%q0, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vrshrq_n_s, vrshrq_n_u])
>  ;;
> @@ -3098,18 +3084,19 @@ (define_insn "mve_vqrdmlsdhxq_s"
>  ])
> 
>  ;;
> -;; [vqrshlq_m_n_s, vqrshlq_m_n_u])
> +;; [vqrshlq_m_n_s, vqrshlq_m_n_u]
> +;; [vrshlq_m_n_s, vrshlq_m_n_u]
>  ;;
> -(define_insn "mve_vqrshlq_m_n_"
> +(define_insn "@mve_q_m_n_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  (match_operand:SI 2 "s_register_operand" "r")
>  (match_operand: 3
> "vpr_register_operand" "Up")]
> -  VQRSHLQ_M_N))
> +  MVE_RSHIFT_M_N))
>]
>"TARGET_HAVE_MVE"
> -  "vpst\;vqrshlt.%# %q0, %2"
> +  "vpst\;t.%#\t%q0, %2"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> 
> @@ -3145,22 +3132,6 @@ (define_insn "mve_vrev64q_m_"
>[(set_attr "type" "mve_move")
> (set_a

RE: [PATCH 03/23] arm: [MVE intrinsics] rework vrshlq vqrshlq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 03/23] arm: [MVE intrinsics] rework vrshlq vqrshlq
> 
> Implement vrshlq, vqrshlq using the new MVE builtins framework.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (vqrshlq, vrshlq): New.
>   * config/arm/arm-mve-builtins-base.def (vqrshlq, vrshlq): New.
>   * config/arm/arm-mve-builtins-base.h (vqrshlq, vrshlq): New.
>   * config/arm/arm-mve-builtins.cc (has_inactive_argument): Handle
>   vqrshlq, vrshlq.
>   * config/arm/arm_mve.h (vrshlq): Remove.
>   (vrshlq_m_n): Remove.
>   (vrshlq_m): Remove.
>   (vrshlq_x): Remove.
>   (vrshlq_u8): Remove.
>   (vrshlq_n_u8): Remove.
>   (vrshlq_s8): Remove.
>   (vrshlq_n_s8): Remove.
>   (vrshlq_u16): Remove.
>   (vrshlq_n_u16): Remove.
>   (vrshlq_s16): Remove.
>   (vrshlq_n_s16): Remove.
>   (vrshlq_u32): Remove.
>   (vrshlq_n_u32): Remove.
>   (vrshlq_s32): Remove.
>   (vrshlq_n_s32): Remove.
>   (vrshlq_m_n_u8): Remove.
>   (vrshlq_m_n_s8): Remove.
>   (vrshlq_m_n_u16): Remove.
>   (vrshlq_m_n_s16): Remove.
>   (vrshlq_m_n_u32): Remove.
>   (vrshlq_m_n_s32): Remove.
>   (vrshlq_m_s8): Remove.
>   (vrshlq_m_s32): Remove.
>   (vrshlq_m_s16): Remove.
>   (vrshlq_m_u8): Remove.
>   (vrshlq_m_u32): Remove.
>   (vrshlq_m_u16): Remove.
>   (vrshlq_x_s8): Remove.
>   (vrshlq_x_s16): Remove.
>   (vrshlq_x_s32): Remove.
>   (vrshlq_x_u8): Remove.
>   (vrshlq_x_u16): Remove.
>   (vrshlq_x_u32): Remove.
>   (__arm_vrshlq_u8): Remove.
>   (__arm_vrshlq_n_u8): Remove.
>   (__arm_vrshlq_s8): Remove.
>   (__arm_vrshlq_n_s8): Remove.
>   (__arm_vrshlq_u16): Remove.
>   (__arm_vrshlq_n_u16): Remove.
>   (__arm_vrshlq_s16): Remove.
>   (__arm_vrshlq_n_s16): Remove.
>   (__arm_vrshlq_u32): Remove.
>   (__arm_vrshlq_n_u32): Remove.
>   (__arm_vrshlq_s32): Remove.
>   (__arm_vrshlq_n_s32): Remove.
>   (__arm_vrshlq_m_n_u8): Remove.
>   (__arm_vrshlq_m_n_s8): Remove.
>   (__arm_vrshlq_m_n_u16): Remove.
>   (__arm_vrshlq_m_n_s16): Remove.
>   (__arm_vrshlq_m_n_u32): Remove.
>   (__arm_vrshlq_m_n_s32): Remove.
>   (__arm_vrshlq_m_s8): Remove.
>   (__arm_vrshlq_m_s32): Remove.
>   (__arm_vrshlq_m_s16): Remove.
>   (__arm_vrshlq_m_u8): Remove.
>   (__arm_vrshlq_m_u32): Remove.
>   (__arm_vrshlq_m_u16): Remove.
>   (__arm_vrshlq_x_s8): Remove.
>   (__arm_vrshlq_x_s16): Remove.
>   (__arm_vrshlq_x_s32): Remove.
>   (__arm_vrshlq_x_u8): Remove.
>   (__arm_vrshlq_x_u16): Remove.
>   (__arm_vrshlq_x_u32): Remove.
>   (__arm_vrshlq): Remove.
>   (__arm_vrshlq_m_n): Remove.
>   (__arm_vrshlq_m): Remove.
>   (__arm_vrshlq_x): Remove.
>   (vqrshlq): Remove.
>   (vqrshlq_m_n): Remove.
>   (vqrshlq_m): Remove.
>   (vqrshlq_u8): Remove.
>   (vqrshlq_n_u8): Remove.
>   (vqrshlq_s8): Remove.
>   (vqrshlq_n_s8): Remove.
>   (vqrshlq_u16): Remove.
>   (vqrshlq_n_u16): Remove.
>   (vqrshlq_s16): Remove.
>   (vqrshlq_n_s16): Remove.
>   (vqrshlq_u32): Remove.
>   (vqrshlq_n_u32): Remove.
>   (vqrshlq_s32): Remove.
>   (vqrshlq_n_s32): Remove.
>   (vqrshlq_m_n_u8): Remove.
>   (vqrshlq_m_n_s8): Remove.
>   (vqrshlq_m_n_u16): Remove.
>   (vqrshlq_m_n_s16): Remove.
>   (vqrshlq_m_n_u32): Remove.
>   (vqrshlq_m_n_s32): Remove.
>   (vqrshlq_m_s8): Remove.
>   (vqrshlq_m_s32): Remove.
>   (vqrshlq_m_s16): Remove.
>   (vqrshlq_m_u8): Remove.
>   (vqrshlq_m_u32): Remove.
>   (vqrshlq_m_u16): Remove.
>   (__arm_vqrshlq_u8): Remove.
>   (__arm_vqrshlq_n_u8): Remove.
>   (__arm_vqrshlq_s8): Remove.
>   (__arm_vqrshlq_n_s8): Remove.
>   (__arm_vqrshlq_u16): Remove.
>   (__arm_vqrshlq_n_u16): Remove.
>   (__arm_vqrshlq_s16): Remove.
>   (__arm_vqrshlq_n_s16): Remove.
>   (__arm_vqrshlq_u32): Remove.
>   (__arm_vqrshlq_n_u32): Remove.
>   (__arm_vqrshlq_s32): Remove.
>   (__arm_vqrshlq_n_s32): Remove.
>   (__arm_vqrshlq_m_n_u8): Remove.
>   (__arm_vqrshlq_m_n_s8): Remove.
>   (__arm_vqrshlq_m_n_u16): Remove.
>   (__arm_vqrshlq_m_n_s16): Remove.
>   (__arm_vqrshlq_m_n_u32): Remove.
>   (__arm_vqrshlq_m_n_s32): Remove.
>   (__arm_vqrshlq_m_s8): Remove.
>   (__arm_vqrshlq_m_s32): Remove.
>   (__arm_vqrshlq_m_s16): Remove.
>   (__arm_vqrshlq_m_u8): Remove.
>   (__arm_vqrshlq_m_u32): Remove.
>   (__arm_vqrshlq_m_u16): Remove.
>   (__arm_vqrshlq): Remove.
>   (__arm_vqrshlq_m_n): Remove.
>   (__arm_vqrshlq_m): Remove.
> ---
>  gcc/config/a

RE: [PATCH 04/23] arm: [MVE intrinsics] factorize vqshlq vshlq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 04/23] arm: [MVE intrinsics] factorize vqshlq vshlq
> 
> Factorize vqshlq and vshlq so that they use the same pattern.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MVE_SHIFT_M_R, MVE_SHIFT_M_N)
>   (MVE_SHIFT_N, MVE_SHIFT_R): New.
>   (mve_insn): Add vqshl, vshl.
>   * config/arm/mve.md (mve_vqshlq_n_)
>   (mve_vshlq_n_): Merge into ...
>   (@mve_q_n_): ... this.
>   (mve_vqshlq_r_, mve_vshlq_r_): Merge
> into
>   ...
>   (@mve_q_r_): ... this.
>   (mve_vqshlq_m_r_, mve_vshlq_m_r_):
> Merge
>   into ...
>   (@mve_q_m_r_): ... this.
>   (mve_vqshlq_m_n_,
> mve_vshlq_m_n_): Merge
>   into ...
>   (@mve_q_m_n_): ... this.
>   * config/arm/vec-common.md (mve_vshlq_):
> Transform
>   into ...
>   (@mve_q_): ... this.
> ---
>  gcc/config/arm/iterators.md  | 29 +++
>  gcc/config/arm/mve.md| 99 
>  gcc/config/arm/vec-common.md |  4 +-
>  3 files changed, 51 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index e7622fe752a..c53b42a86e9 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -435,6 +435,26 @@ (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
>VORRQ_N_S VORRQ_N_U
>])
> 
> +(define_int_iterator MVE_SHIFT_M_R   [
> +  VQSHLQ_M_R_S VQSHLQ_M_R_U
> +  VSHLQ_M_R_S VSHLQ_M_R_U
> +  ])
> +
> +(define_int_iterator MVE_SHIFT_M_N   [
> +  VQSHLQ_M_N_S VQSHLQ_M_N_U
> +  VSHLQ_M_N_S VSHLQ_M_N_U
> +  ])
> +
> +(define_int_iterator MVE_SHIFT_N   [
> +  VQSHLQ_N_S VQSHLQ_N_U
> +  VSHLQ_N_S VSHLQ_N_U
> +  ])
> +
> +(define_int_iterator MVE_SHIFT_R   [
> +  VQSHLQ_R_S VQSHLQ_R_U
> +  VSHLQ_R_S VSHLQ_R_U
> +  ])
> +
>  (define_int_iterator MVE_RSHIFT_M_N   [
>VQRSHLQ_M_N_S VQRSHLQ_M_N_U
>VRSHLQ_M_N_S VRSHLQ_M_N_U
> @@ -540,7 +560,11 @@ (define_int_attr mve_insn [
>(VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
>(VQRSHLQ_N_S "vqrshl") (VQRSHLQ_N_U "vqrshl")
>(VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
> +  (VQSHLQ_M_N_S "vqshl") (VQSHLQ_M_N_U "vqshl")
> +  (VQSHLQ_M_R_S "vqshl") (VQSHLQ_M_R_U "vqshl")
>(VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
> +  (VQSHLQ_N_S "vqshl") (VQSHLQ_N_U "vqshl")
> +  (VQSHLQ_R_S "vqshl") (VQSHLQ_R_U "vqshl")
>(VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
>(VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
>(VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
> @@ -554,7 +578,12 @@ (define_int_attr mve_insn [
>(VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
>(VRSHLQ_N_S "vrshl") (VRSHLQ_N_U "vrshl")
>(VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
> +  (VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
> +  (VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
>(VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
> +  (VSHLQ_N_S "vshl") (VSHLQ_N_U "vshl")
> +  (VSHLQ_R_S "vshl") (VSHLQ_R_U "vshl")
> +  (VSHLQ_S "vshl") (VSHLQ_U "vshl")
>(VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>(VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
>(VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 0d3343b6e29..fb1076aef73 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1389,32 +1389,34 @@ (define_insn
> "@mve_q_n_"
>  ])
> 
>  ;;
> -;; [vqshlq_n_s, vqshlq_n_u])
> +;; [vqshlq_n_s, vqshlq_n_u]
> +;; [vshlq_n_u, vshlq_n_s]
>  ;;
> -(define_insn "mve_vqshlq_n_"
> +(define_insn "@mve_q_n_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  (match_operand:SI 2 "immediate_operand" "i")]
> -  VQSHLQ_N))
> +  MVE_SHIFT_N))
>]
>"TARGET_HAVE_MVE"
> -  "vqshl.%#\t%q0, %q1, %2"
> +  ".%#\t%q0, %q1, %2"
>[(set_attr "type" "mve_move")
>  ])
> 
>  ;;
> -;; [vqshlq_r_u, vqshlq_r_s])
> +;; [vqshlq_r_u, vqshlq_r_s]
> +;; [vshlq_r_s, vshlq_r_u]
>  ;;
> -(define_insn "mve_vqshlq_r_"
> +(define_insn "@mve_q_r_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  (m

RE: [PATCH 05/23] arm: [MVE intrinsics] rework vqrdmulhq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 05/23] arm: [MVE intrinsics] rework vqrdmulhq
> 
> Implement vqrdmulhq using the new MVE builtins framework.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (vqrdmulhq): New.
>   * config/arm/arm-mve-builtins-base.def (vqrdmulhq): New.
>   * config/arm/arm-mve-builtins-base.h (vqrdmulhq): New.
>   * config/arm/arm_mve.h (vqrdmulhq): Remove.
>   (vqrdmulhq_m): Remove.
>   (vqrdmulhq_s8): Remove.
>   (vqrdmulhq_n_s8): Remove.
>   (vqrdmulhq_s16): Remove.
>   (vqrdmulhq_n_s16): Remove.
>   (vqrdmulhq_s32): Remove.
>   (vqrdmulhq_n_s32): Remove.
>   (vqrdmulhq_m_n_s8): Remove.
>   (vqrdmulhq_m_n_s32): Remove.
>   (vqrdmulhq_m_n_s16): Remove.
>   (vqrdmulhq_m_s8): Remove.
>   (vqrdmulhq_m_s32): Remove.
>   (vqrdmulhq_m_s16): Remove.
>   (__arm_vqrdmulhq_s8): Remove.
>   (__arm_vqrdmulhq_n_s8): Remove.
>   (__arm_vqrdmulhq_s16): Remove.
>   (__arm_vqrdmulhq_n_s16): Remove.
>   (__arm_vqrdmulhq_s32): Remove.
>   (__arm_vqrdmulhq_n_s32): Remove.
>   (__arm_vqrdmulhq_m_n_s8): Remove.
>   (__arm_vqrdmulhq_m_n_s32): Remove.
>   (__arm_vqrdmulhq_m_n_s16): Remove.
>   (__arm_vqrdmulhq_m_s8): Remove.
>   (__arm_vqrdmulhq_m_s32): Remove.
>   (__arm_vqrdmulhq_m_s16): Remove.
>   (__arm_vqrdmulhq): Remove.
>   (__arm_vqrdmulhq_m): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |   1 +
>  gcc/config/arm/arm-mve-builtins-base.def |   1 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   1 +
>  gcc/config/arm/arm_mve.h | 213 ---
>  4 files changed, 3 insertions(+), 213 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index f5e48519b19..8c125657c67 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -158,6 +158,7 @@ FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR,
> VORRQ)
>  FUNCTION_WITH_M_N_NO_F (vqaddq, VQADDQ)
>  FUNCTION_WITH_M_N_NO_U_F (vqdmulhq, VQDMULHQ)
>  FUNCTION_WITH_M_N_NO_F (vqrshlq, VQRSHLQ)
> +FUNCTION_WITH_M_N_NO_U_F (vqrdmulhq, VQRDMULHQ)
>  FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
>  FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index e6dc2b00aaa..5b9966341ce 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -29,6 +29,7 @@ DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer,
> mx_or_none)
>  DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vqaddq, binary_opt_n, all_integer, m_or_none)
>  DEF_MVE_FUNCTION (vqdmulhq, binary_opt_n, all_signed, m_or_none)
> +DEF_MVE_FUNCTION (vqrdmulhq, binary_opt_n, all_signed, m_or_none)
>  DEF_MVE_FUNCTION (vqrshlq, binary_round_lshift, all_integer, m_or_none)
>  DEF_MVE_FUNCTION (vqsubq, binary_opt_n, all_integer, m_or_none)
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index 31ba3fece82..eeb747d52ad 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -34,6 +34,7 @@ extern const function_base *const vmulq;
>  extern const function_base *const vorrq;
>  extern const function_base *const vqaddq;
>  extern const function_base *const vqdmulhq;
> +extern const function_base *const vqrdmulhq;
>  extern const function_base *const vqrshlq;
>  extern const function_base *const vqsubq;
>  extern const function_base *const vreinterpretq;
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 636945d6ef0..44b383dbe08 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -94,7 +94,6 @@
>  #define vcmpgtq(__a, __b) __arm_vcmpgtq(__a, __b)
>  #define vcmpgeq(__a, __b) __arm_vcmpgeq(__a, __b)
>  #define vqshluq(__a, __imm) __arm_vqshluq(__a, __imm)
> -#define vqrdmulhq(__a, __b) __arm_vqrdmulhq(__a, __b)
>  #define vmlsdavxq(__a, __b) __arm_vmlsdavxq(__a, __b)
>  #define vmlsdavq(__a, __b) __arm_vmlsdavq(__a, __b)
>  #define vmladavxq(__a, __b) __arm_vmladavxq(__a, __b)
> @@ -249,7 +248,6 @@
>  #define vqrdmlashq_m(__a, __b, __c, __p) __arm_vqrdmlashq_m(__a, __b,
> __c, __p)
>  #define vqrdmlsdhq_m(__inactive, __a, __b, __p)
> __arm_vqrdmlsdhq_m(__inactive, __a, __b, __p)
>  #define vqrdmlsdhxq_m(__inactive, __a, __b, __p)
> __arm_vqrdmlsdhxq_m(__inactive, __a, __b, __p)
> -#define vqrdmulhq_m(__inactive, __a, __b, __p)
> __arm_vqrdmulhq_m(__inactive, __a, __

RE: [PATCH] RISC-V: Legitimise the const0_rtx for RVV indexed load/store

2023-05-05 Thread Li, Pan2 via Gcc-patches

Updated the PATCH v2 with x86 bootstrap and regression test passed.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617449.html

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Thursday, May 4, 2023 4:44 PM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: Kito.cheng ; Wang, Yanzhang 
Subject: RE: [PATCH] RISC-V: Legitimise the const0_rtx for RVV indexed 
load/store

Thanks Juzhe, make sense, let me update it soon.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, May 4, 2023 4:40 PM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH] RISC-V: Legitimise the const0_rtx for RVV indexed 
load/store

vluxei32.v  v1,(0),v1 is not correct assembly.
Instead,  it should be vluxei32.v  v1,(zero),v1

You should change the assembly print: (%1) --> (%z1)


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-05-04 16:35
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH] RISC-V: Legitimise the const0_rtx for RVV indexed load/store
From: Pan Li mailto:pan2...@intel.com>>

This patch try to legitimise the const0_rtx (aka zero register) as the base 
register for the RVV indexed load/store instructions by allowing the const as 
the operand of the indexed RTL pattern.
Then the underlying combine pass will try to perform the const propagation.

For example:
vint32m1_t
test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl) {
  return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl); }

Before this patch:
li a5,0 <- can be eliminated.
vl1re32.v  v1,0(a1)
vsetvlizero,a2,e32,m1,ta,ma
vluxei32.v v1,(a5),v1   <- can propagate the const 0 to a5 here.
vs1r.v v1,0(a0)
ret

After this patch:
test_vluxei32_v_i32m1_shortcut:
vl1re32.v   v1,0(a1)
vsetvli zero,a2,e32,m1,ta,ma
vluxei32.v  v1,(0),v1
vs1r.v  v1,0(a0)
ret

As above, this patch allow you to propagaate the const 0 (aka zero
register) to the base register of the RVV indexed load in the combine pass. 
This may benefit the underlying RVV auto-vectorization.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-authored-by: Ju-Zhe Zhong mailto:juzhe.zh...@rivai.ai>>

gcc/ChangeLog:

* config/riscv/vector.md: Allow const as the operand of RVV
  indexed load/store.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
  Adjust indexed load/store check condition.
---
gcc/config/riscv/vector.md| 32 +--
.../base/zero_base_load_store_optimization.c  |  3 +-
2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 
92115e3935f..c3210eacd47 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1511,7 +1511,7 @@ (define_insn "@pred_indexed_load_same_eew"
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:V
- [(match_operand 3 "pmode_register_operand""  r,  r, r,  r")
+ [(match_operand 3 "pmode_reg_or_0_operand"" rJ, rJ,rJ, rJ")
 (mem:BLK (scratch))
 (match_operand: 4 "register_operand" " vr, vr,vr, vr")] ORDER)
  (match_operand:V 2 "vector_merge_operand"   " vu, vu, 0,  0")))]
@@ -1533,7 +1533,7 @@ (define_insn 
"@pred_indexed_load_x2_greater_eew"
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:VEEWEXT2
- [(match_operand 3 "pmode_register_operand" "r,r")
+ [(match_operand 3 "pmode_reg_or_0_operand" "   rJ,   rJ")
 (mem:BLK (scratch))
 (match_operand: 4 "register_operand" "   vr,   vr")] 
ORDER)
  (match_operand:VEEWEXT2 2 "vector_merge_operand" "   vu,0")))]
@@ -1554,7 +1554,7 @@ (define_insn 
"@pred_indexed_load_x4_greater_eew"
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:VEEWEXT4
- [(match_operand 3 "pmode_register_operand" "r,r")
+ [(match_operand 3 "pmode_reg_or_0_operand" "   rJ,   rJ")
 (mem:BLK (scratch))
 (match_operand: 4 "register_operand"   "   vr,   vr")] 
ORDER)
  (match_operand:VEEWEXT4 2 "vector_merge_operand" "   vu,0")))]
@@ -1575,7 +1575,7 @@ (define_insn 
"@pred_indexed_load_x8_greater_eew"
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:VEEWEXT8
- [(match_operand 3 "pmode_register_operand" "r,r")
+ [(match_operand 3 "pmode_reg_or_0_operand" "   rJ,   rJ")
 (mem:BLK (scratch))
 (match_operand: 4 "register_operand""   vr,   vr")] 
ORDER)
  (match_operand:VEEWEXT8 2 "vector_merge_operand" "   vu,0")))]
@@ -1597,7 +1597,7 @@ (d

Re: [libstdc++] use strtold for from_chars even without locale

2023-05-05 Thread Alexandre Oliva via Gcc-patches

Here's a patch to skip/xfail the bits that are expected to fail on
aarch64-vxworks.


[libstdc++] [testsuite] xfail double-prec from_chars for ldbl

When long double is wider than double, but from_chars is implemented
in terms of double, tests that involve the full precision of long
double are expected to fail.  Mark them as such on aarch64-*-vxworks.


for  libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double test06
on aarch64-vxworks.
* testsuite/20_util/to_chars/long_double.cc: Xfail run on
aarch64-vxworks.
---
 libstdc++-v3/testsuite/20_util/from_chars/4.cc |3 ++-
 .../testsuite/20_util/to_chars/long_double.cc  |4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc 
b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
index dd55690eb6511..c3594f9014bd3 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
@@ -18,6 +18,7 @@
 //  is supported in C++14 as a GNU extension
 // { dg-do run { target c++14 } }
 // { dg-add-options ieee }
+// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-vxworks* } }
 
 #include 
 #include 
@@ -354,7 +355,7 @@ test06()
 {
   test_max_mantissa();
   test_max_mantissa();
-#ifdef __GLIBCXX_TYPE_INT_N_0
+#if defined __GLIBCXX_TYPE_INT_N_0 && !defined SKIP_LONG_DOUBLE
   test_max_mantissa();
 #endif
 }
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
index 880c98021876d..263144bd42cba 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
@@ -34,6 +34,10 @@
 // more portable and robust to differences in system printf behavior.
 // { dg-xfail-run-if "Non-conforming printf (see PR98384)" { *-*-solaris* 
*-*-darwin* } }
 
+// On systems that use double-precision from_chars for long double,
+// this is expected to fail.
+// { dg-xfail-run-if "from_chars limited to double-precision" { 
aarch64-*-vxworks* } }
+
 // { dg-require-effective-target ieee_floats }
 // { dg-require-effective-target size32plus }
 // { dg-require-cmath "" }


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 11:55:41AM +0200, Jakub Jelinek via Gcc-patches wrote:
> Looking at the Ada cases (I admit I don't really understand why it isn't
> vectorized, the IL is so different from the start because of the extra
> SAVE_EXPRs that it is very hard to diff stuff), the case where save_expr
> used to return the argument and no longer does are those
> r.P_BOUNDS->LB0
> etc. cases.  Now, I wondered if (pre-gimplification) we couldn't make an
> exception and allow the base to be INDIRECT_REF or of a REFERENCE_TYPE
> with the idea that references are really imutable and can't be changed
> during its lifetime (after gimplification whether something is
> REFERENCE_TYPE or POINTER_TYPE is lost), but that isn't what Ada is using.
> 
> So, another possibility would be to allow bases of TREE_READONLY (t) &&
> !TREE_SIDE_EFFECTS (t) which are INDIRECT_REFs of tree_invariant_p_1
> addresses.  That doesn't work either, in the r.P_BOUNDS->LB0 case
> P_BOUNDS is a FIELD_DECL with POINTER_TYPE, LB0 is TREE_READONLY FIELD_DECL
> and that COMPONENT_REF is  also TREE_READONLY, r is TREE_READONLY PARM_DECL,
> but unforuntately the r.P_BOUNDS COMPONENT_REF isn't marked TREE_READONLY.
> 
> Thus, shall we treat as tree_invariant_p_1 also handled components which
> are !TREE_SIDE_EFFECTS (t), but not TREE_READONLY and only their base
> is TREE_READONLY?  Or do that only during the recursion?

But doing that feels quite risky.  While the following version of
the patch avoids the Ada regressions, the fact that we don't miscompile
the pr52339-1.c testcase modified to have
int
foo (const struct S *const p, struct S *q)
rather than
int
foo (const struct S *p, struct S *q)
is only because the FE happens to add there some useless cast in between.
While the pointer is invariant, I'm afraid nothing guarantees it goes out
of scope in between multiple uses of the expression returned by save_expr.

2023-05-05  Jakub Jelinek  

PR c++/52339
* tree.cc (tree_invariant_p_1): For TREE_READONLY (t) without
side-effects, only return true if DECL_P (get_base_address (t)).

* g++.dg/opt/pr52339.C: New test.
* gcc.c-torture/execute/pr52339-1.c: New test.
* gcc.c-torture/execute/pr52339-2.c: New test.

--- gcc/tree.cc.jj  2023-05-01 09:59:46.686293833 +0200
+++ gcc/tree.cc 2023-05-05 12:34:26.989523468 +0200
@@ -3876,10 +3876,26 @@ tree_invariant_p_1 (tree t)
 {
   tree op;
 
-  if (TREE_CONSTANT (t)
-  || (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t)))
+  if (TREE_CONSTANT (t))
 return true;
 
+  if (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))
+{
+  /* Return true for const qualified vars, but for members or array
+elements without side-effects return true only if the base
+object is a decl.  If the base is e.g. a pointer dereference,
+what the pointer points to could be deallocated or the pointer
+could be changed.  See PR52339.  */
+  tree base = get_base_address (t);
+  if (DECL_P (base))
+   return true;
+  /* As an exception, allow pointer dereferences as long as the pointer
+is invariant.  */
+  if (TREE_CODE (base) == INDIRECT_REF
+ && tree_invariant_p_1 (get_base_address (TREE_OPERAND (base, 0
+   return true;
+}
+
   switch (TREE_CODE (t))
 {
 case SAVE_EXPR:
--- gcc/testsuite/g++.dg/opt/pr52339.C.jj   2023-05-04 15:23:20.459935705 
+0200
+++ gcc/testsuite/g++.dg/opt/pr52339.C  2023-05-04 15:22:35.640578681 +0200
@@ -0,0 +1,19 @@
+// PR c++/52339
+// { dg-do run { target c++11 } }
+
+
+struct B;
+struct A { B *b; };
+struct B {
+  A *a;
+  B () : a(new A{this}) {}
+  ~B () { delete a; }
+};
+ 
+int
+main ()
+{
+  B *b = new B;
+  const A *a = b->a;
+  delete a->b;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr52339-1.c.jj  2023-05-04 
15:22:59.177241023 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr52339-1.c 2023-05-04 
15:20:19.820527142 +0200
@@ -0,0 +1,29 @@
+/* PR c++/52339 */
+
+struct S { int a; };
+
+void
+bar (int *p, struct S *q)
+{
+  __builtin_free (q);
+}
+
+int
+foo (const struct S *p, struct S *q)
+{
+  int b[p->a];
+  bar (b, q);
+  return sizeof (b);
+}
+
+int
+main ()
+{
+  struct S *p = __builtin_malloc (sizeof (struct S));
+  if (!p)
+return 0;
+  p->a = 42;
+  if (foo (p, p) != 42 * sizeof (int))
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr52339-2.c.jj  2022-11-21 
10:04:00.210677046 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr52339-2.c 2023-05-04 
19:34:08.581686806 +0200
@@ -0,0 +1,20 @@
+/* PR c++/52339 */
+
+struct S { int a; };
+
+int
+foo (const struct S *p)
+{
+  int b[p->a];
+  ++p;
+  return sizeof (b);
+}
+
+int
+main ()
+{
+  struct S s[] = { { 42 }, { 43 } };
+  if (foo (s) != 42 * sizeof (int))
+__builtin_abort ();
+  return 0;
+}


Jakub

RE: [PATCH 06/23] arm: [MVE intrinsics] factorize vabdq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 06/23] arm: [MVE intrinsics] factorize vabdq
> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MVE_FP_M_BINARY): Add vabdq.
>   (MVE_FP_VABDQ_ONLY): New.
>   (mve_insn): Add vabd.
>   * config/arm/mve.md (mve_vabdq_f): Move into ...
>   (@mve_q_f): ... this.
>   (mve_vabdq_m_f): Remove.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/iterators.md |  9 +++--
>  gcc/config/arm/mve.md   | 25 +
>  2 files changed, 12 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index c53b42a86e9..3133642ea82 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -466,6 +466,7 @@ (define_int_iterator MVE_RSHIFT_N   [
>])
> 
>  (define_int_iterator MVE_FP_M_BINARY   [
> +  VABDQ_M_F
>VADDQ_M_F
>VMULQ_M_F
>VSUBQ_M_F
> @@ -490,6 +491,10 @@ (define_int_iterator MVE_FP_N_BINARY   [
>VSUBQ_N_F
>])
> 
> +(define_int_iterator MVE_FP_VABDQ_ONLY [
> +  VABDQ_F
> +  ])
> +
>  (define_int_iterator MVE_FP_CREATE_ONLY [
>VCREATEQ_F
>])
> @@ -501,8 +506,8 @@ (define_code_attr mve_addsubmul [
>])
> 
>  (define_int_attr mve_insn [
> -  (VABDQ_M_S "vabd") (VABDQ_M_U "vabd")
> -  (VABDQ_S "vabd") (VABDQ_U "vabd")
> +  (VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F
> "vabd")
> +  (VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
>(VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
>(VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
>(VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index fb1076aef73..c8cb4e430ac 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1451,17 +1451,17 @@ (define_insn "mve_vrshrq_n_"
>  ])
> 
>  ;;
> -;; [vabdq_f])
> +;; [vabdq_f]
>  ;;
> -(define_insn "mve_vabdq_f"
> +(define_insn "@mve_q_f"
>[
> (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>   (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
> "w")
>  (match_operand:MVE_0 2 "s_register_operand" "w")]
> -  VABDQ_F))
> +  MVE_FP_VABDQ_ONLY))
>]
>"TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vabd.f%#   %q0, %q1, %q2"
> +  ".f%#\t%q0, %q1, %q2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -5483,24 +5483,9 @@ (define_insn "mve_vrmlsldavhaxq_p_sv4si"
>"vpst\;vrmlsldavhaxt.s32\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> -;;
> -;; [vabdq_m_f])
> -;;
> -(define_insn "mve_vabdq_m_f"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> - (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -(match_operand:MVE_0 2 "s_register_operand" "w")
> -(match_operand:MVE_0 3 "s_register_operand" "w")
> -(match_operand: 4
> "vpr_register_operand" "Up")]
> -  VABDQ_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vabdt.f%#%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> 
>  ;;
> +;; [vabdq_m_f]
>  ;; [vaddq_m_f]
>  ;; [vsubq_m_f]
>  ;; [vmulq_m_f]
> --
> 2.34.1

RE: [PATCH 07/23] arm: [MVE intrinsics] rework vabdq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 07/23] arm: [MVE intrinsics] rework vabdq
> 
> Implement vabdq using the new MVE builtins framework.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N):
> New.
>   (vabdq): New.
>   * config/arm/arm-mve-builtins-base.def (vabdq): New.
>   * config/arm/arm-mve-builtins-base.h (vabdq): New.
>   * config/arm/arm_mve.h (vabdq): Remove.
>   (vabdq_m): Remove.
>   (vabdq_x): Remove.
>   (vabdq_u8): Remove.
>   (vabdq_s8): Remove.
>   (vabdq_u16): Remove.
>   (vabdq_s16): Remove.
>   (vabdq_u32): Remove.
>   (vabdq_s32): Remove.
>   (vabdq_f16): Remove.
>   (vabdq_f32): Remove.
>   (vabdq_m_s8): Remove.
>   (vabdq_m_s32): Remove.
>   (vabdq_m_s16): Remove.
>   (vabdq_m_u8): Remove.
>   (vabdq_m_u32): Remove.
>   (vabdq_m_u16): Remove.
>   (vabdq_m_f32): Remove.
>   (vabdq_m_f16): Remove.
>   (vabdq_x_s8): Remove.
>   (vabdq_x_s16): Remove.
>   (vabdq_x_s32): Remove.
>   (vabdq_x_u8): Remove.
>   (vabdq_x_u16): Remove.
>   (vabdq_x_u32): Remove.
>   (vabdq_x_f16): Remove.
>   (vabdq_x_f32): Remove.
>   (__arm_vabdq_u8): Remove.
>   (__arm_vabdq_s8): Remove.
>   (__arm_vabdq_u16): Remove.
>   (__arm_vabdq_s16): Remove.
>   (__arm_vabdq_u32): Remove.
>   (__arm_vabdq_s32): Remove.
>   (__arm_vabdq_m_s8): Remove.
>   (__arm_vabdq_m_s32): Remove.
>   (__arm_vabdq_m_s16): Remove.
>   (__arm_vabdq_m_u8): Remove.
>   (__arm_vabdq_m_u32): Remove.
>   (__arm_vabdq_m_u16): Remove.
>   (__arm_vabdq_x_s8): Remove.
>   (__arm_vabdq_x_s16): Remove.
>   (__arm_vabdq_x_s32): Remove.
>   (__arm_vabdq_x_u8): Remove.
>   (__arm_vabdq_x_u16): Remove.
>   (__arm_vabdq_x_u32): Remove.
>   (__arm_vabdq_f16): Remove.
>   (__arm_vabdq_f32): Remove.
>   (__arm_vabdq_m_f32): Remove.
>   (__arm_vabdq_m_f16): Remove.
>   (__arm_vabdq_x_f16): Remove.
>   (__arm_vabdq_x_f32): Remove.
>   (__arm_vabdq): Remove.
>   (__arm_vabdq_m): Remove.
>   (__arm_vabdq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
>  gcc/config/arm/arm-mve-builtins-base.def |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   1 +
>  gcc/config/arm/arm_mve.h | 431 ---
>  4 files changed, 13 insertions(+), 431 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 8c125657c67..a74119db917 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -146,6 +146,16 @@ namespace arm_mve {
>  UNSPEC##_M_S, -1, -1,\
>  UNSPEC##_M_N_S, -1, -1))
> 
> +  /* Helper for builtins with only unspec codes, _m predicated
> + overrides, but no _n version.  */
> +#define FUNCTION_WITHOUT_N(NAME, UNSPEC) FUNCTION
>   \
> +  (NAME, unspec_mve_function_exact_insn, \
> +   (UNSPEC##_S, UNSPEC##_U, UNSPEC##_F,
>   \
> +-1, -1, -1,  
> \
> +UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
>   \
> +-1, -1, -1))
> +
> +FUNCTION_WITHOUT_N (vabdq, VABDQ)
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
>  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
>  FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 5b9966341ce..9230837fd43 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -18,6 +18,7 @@
> .  */
> 
>  #define REQUIRES_FLOAT false
> +DEF_MVE_FUNCTION (vabdq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
> @@ -41,6 +42,7 @@ DEF_MVE_FUNCTION (vuninitializedq, inherent,
> all_integer_with_64, none)
>  #undef REQUIRES_FLOAT
> 
>  #define REQUIRES_FLOAT true
> +DEF_MVE_FUNCTION (vabdq, binary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index eeb747d52ad..d9d45d1925a 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -23,6 +23,7 @@
>  namespace arm_mve {
>  namespace function

RE: [PATCH 08/23] arm: [MVE intrinsics] add binary_lshift shape

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 08/23] arm: [MVE intrinsics] add binary_lshift shape
> 
> This patch adds the binary_lshift shape description.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_lshift): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_lshift): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 57 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 58 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 28a2d66ddd1..e5093c3f29d 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -519,6 +519,63 @@ struct binary_round_lshift_def : public
> overloaded_base<0>
>  };
>  SHAPE (binary_round_lshift)
> 
> +/* _t vfoo[_t0](_t, _t)
> +   _t vfoo_n[_t0](_t, const int)
> +
> +   i.e. the standard shape for left shift operations that operate on
> +   vector types.
> +
> +   For the MODE_n versions, check that 'imm' is in the [0..#bits-1] range.
> +
> +   Example: vshlq.
> +   int8x16_t [__arm_]vshlq[_s8](int8x16_t a, int8x16_t b)
> +   int8x16_t [__arm_]vshlq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vshlq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
> +   int8x16_t [__arm_]vshlq_n[_s8](int8x16_t a, const int imm)
> +   int8x16_t [__arm_]vshlq_m_n[_s8](int8x16_t inactive, int8x16_t a, const
> int imm, mve_pred16_t p)
> +   int8x16_t [__arm_]vshlq_x_n[_s8](int8x16_t a, const int imm,
> mve_pred16_t p)  */
> +struct binary_lshift_def : public overloaded_base<0>
> +{
> +  bool
> +  explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +return true;
> +  }
> +
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +build_all (b, "v0,v0,vs0", group, MODE_none, preserve_user_namespace);
> +build_all (b, "v0,v0,ss32", group, MODE_n, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (2, i, nargs)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
> +  return error_mark_node;
> +
> +return r.finish_opt_n_resolution (i, 0, type, TYPE_signed);
> +  }
> +
> +  bool
> +  check (function_checker &c) const override
> +  {
> +if (c.mode_suffix_id != MODE_n)
> +  return true;
> +
> +unsigned int bits = c.type_suffix (0).element_bits;
> +return c.require_immediate_range (1, 0, bits - 1);
> +  }
> +};
> +SHAPE (binary_lshift)
> +
>  /* xN_t vfoo[_t0](uint64_t, uint64_t)
> 
> where there are N arguments in total.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index cef081aa8ec..e472862ceef 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -35,6 +35,7 @@ namespace arm_mve
>{
> 
>  extern const function_shape *const binary;
> +extern const function_shape *const binary_lshift;
>  extern const function_shape *const binary_opt_n;
>  extern const function_shape *const binary_orrq;
>  extern const function_shape *const binary_round_lshift;
> --
> 2.34.1

RE: [PATCH 09/23] arm: [MVE intrinsics] add support for MODE_r

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 09/23] arm: [MVE intrinsics] add support for MODE_r
> 

This is missing a description of what MODE_r is.
I've deduced what it is from looking at the next 3 patches in the series, but I 
think this patch should have at least a one-sentence summary.
Therefore ok with a cover letter.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins.cc (has_inactive_argument)
>   (finish_opt_n_resolution): Handle MODE_r.
>   * config/arm/arm-mve-builtins.def (r): New mode.
> ---
>  gcc/config/arm/arm-mve-builtins.cc  | 8 ++--
>  gcc/config/arm/arm-mve-builtins.def | 1 +
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index 91b3ae71f94..c25b1be9903 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -669,7 +669,8 @@ function_instance::has_inactive_argument () const
>if (pred != PRED_m)
>  return false;
> 
> -  if ((base == functions::vorrq && mode_suffix_id == MODE_n)
> +  if (mode_suffix_id == MODE_r
> +  || (base == functions::vorrq && mode_suffix_id == MODE_n)
>|| (base == functions::vqrshlq && mode_suffix_id == MODE_n)
>|| (base == functions::vrshlq && mode_suffix_id == MODE_n))
>  return false;
> @@ -1522,7 +1523,10 @@ finish_opt_n_resolution (unsigned int argno,
> unsigned int first_argno,
>  {
>if (inferred_type == NUM_TYPE_SUFFIXES)
>  inferred_type = first_type;
> -  tree scalar_form = lookup_form (MODE_n, inferred_type);
> +  mode_suffix_index scalar_mode = MODE_n;
> +  if (mode_suffix_id == MODE_r)
> +scalar_mode = MODE_r;
> +  tree scalar_form = lookup_form (scalar_mode, inferred_type);
> 
>/* Allow the final argument to be scalar, if an _n form exists.  */
>if (scalar_argument_p (argno))
> diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-
> builtins.def
> index 49d07364fa2..e3f37876210 100644
> --- a/gcc/config/arm/arm-mve-builtins.def
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -35,6 +35,7 @@
> 
>  DEF_MVE_MODE (n, none, none, none)
>  DEF_MVE_MODE (offset, none, none, bytes)
> +DEF_MVE_MODE (r, none, none, none)
> 
>  #define REQUIRES_FLOAT false
>  DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
> --
> 2.34.1

RE: [PATCH 10/23] arm: [MVE intrinsics] add binary_lshift_r shape

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 10/23] arm: [MVE intrinsics] add binary_lshift_r shape
> 
> This patch adds the binary_lshift_r shape description.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_lshift_r): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_lshift_r): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 41 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 42 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index e5093c3f29d..4ecb612ece5 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -576,6 +576,47 @@ struct binary_lshift_def : public overloaded_base<0>
>  };
>  SHAPE (binary_lshift)
> 
> +/* Used with the above form, but only for the MODE_r case which does
> +   not always support the same set of predicates as MODE_none and
> +   MODE_n.  For vqshlq they are the same, but for vshlq they are not.
> +
> +   _t vfoo_r[_t0](_t, int32_t)
> +
> +   i.e. the standard shape for shift operations that operate on
> +   vector types.
> +   Example: vshlq.
> +   int8x16_t [__arm_]vshlq_r[_s8](int8x16_t a, int32_t b)
> +   int8x16_t [__arm_]vshlq_m_r[_s8](int8x16_t a, int32_t b, mve_pred16_t p)
> */
> +struct binary_lshift_r_def : public overloaded_base<0>
> +{
> +  bool
> +  explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +return true;
> +  }
> +
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_r,
> preserve_user_namespace);
> +build_all (b, "v0,v0,ss32", group, MODE_r, preserve_user_namespace,
> false, preds_m_or_none);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (2, i, nargs)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
> +  return error_mark_node;
> +
> +return r.finish_opt_n_resolution (i, 0, type, TYPE_signed);
> +  }
> +};
> +SHAPE (binary_lshift_r)
> +
>  /* xN_t vfoo[_t0](uint64_t, uint64_t)
> 
> where there are N arguments in total.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index e472862ceef..25d9b60a670 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -36,6 +36,7 @@ namespace arm_mve
> 
>  extern const function_shape *const binary;
>  extern const function_shape *const binary_lshift;
> +extern const function_shape *const binary_lshift_r;
>  extern const function_shape *const binary_opt_n;
>  extern const function_shape *const binary_orrq;
>  extern const function_shape *const binary_round_lshift;
> --
> 2.34.1

RE: [PATCH 11/23] arm: [MVE intrinsics] add unspec_mve_function_exact_insn_vshl

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 11/23] arm: [MVE intrinsics] add
> unspec_mve_function_exact_insn_vshl
> 
> Introduce a function that will be used to build vshl intrinsics. They
> are special because they have to handle MODE_r.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-functions.h (class
>   unspec_mve_function_exact_insn_vshl): New.
> ---
>  gcc/config/arm/arm-mve-builtins-functions.h | 150 
>  1 file changed, 150 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index 5abf913d182..533fd1159c6 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -376,6 +376,156 @@ public:
>}
>  };
> 
> +/* Map the function directly to CODE (UNSPEC, M) for vshl-like
> +   builtins. The difference with unspec_mve_function_exact_insn is
> +   that this function handles MODE_r and the related unspecs..  */
> +class unspec_mve_function_exact_insn_vshl : public function_base
> +{
> +public:
> +  CONSTEXPR unspec_mve_function_exact_insn_vshl (int unspec_for_sint,
> +  int unspec_for_uint,
> +  int unspec_for_n_sint,
> +  int unspec_for_n_uint,
> +  int unspec_for_m_sint,
> +  int unspec_for_m_uint,
> +  int unspec_for_m_n_sint,
> +  int unspec_for_m_n_uint,
> +  int unspec_for_m_r_sint,
> +  int unspec_for_m_r_uint,
> +  int unspec_for_r_sint,
> +  int unspec_for_r_uint)
> +: m_unspec_for_sint (unspec_for_sint),
> +  m_unspec_for_uint (unspec_for_uint),
> +  m_unspec_for_n_sint (unspec_for_n_sint),
> +  m_unspec_for_n_uint (unspec_for_n_uint),
> +  m_unspec_for_m_sint (unspec_for_m_sint),
> +  m_unspec_for_m_uint (unspec_for_m_uint),
> +  m_unspec_for_m_n_sint (unspec_for_m_n_sint),
> +  m_unspec_for_m_n_uint (unspec_for_m_n_uint),
> +  m_unspec_for_m_r_sint (unspec_for_m_r_sint),
> +  m_unspec_for_m_r_uint (unspec_for_m_r_uint),
> +  m_unspec_for_r_sint (unspec_for_r_sint),
> +  m_unspec_for_r_uint (unspec_for_r_uint)
> +  {}
> +
> +  /* The unspec code associated with signed-integer, unsigned-integer
> + and floating-point operations respectively.  It covers the cases
> + with the _n suffix, and/or the _m predicate.  */
> +  int m_unspec_for_sint;
> +  int m_unspec_for_uint;
> +  int m_unspec_for_n_sint;
> +  int m_unspec_for_n_uint;
> +  int m_unspec_for_m_sint;
> +  int m_unspec_for_m_uint;
> +  int m_unspec_for_m_n_sint;
> +  int m_unspec_for_m_n_uint;
> +  int m_unspec_for_m_r_sint;
> +  int m_unspec_for_m_r_uint;
> +  int m_unspec_for_r_sint;
> +  int m_unspec_for_r_uint;
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +insn_code code;
> +switch (e.pred)
> +  {
> +  case PRED_none:
> + switch (e.mode_suffix_id)
> +   {
> +   case MODE_none:
> + /* No predicate, no suffix.  */
> + if (e.type_suffix (0).unsigned_p)
> +   code = code_for_mve_q (m_unspec_for_uint, m_unspec_for_uint,
> e.vector_mode (0));
> + else
> +   code = code_for_mve_q (m_unspec_for_sint, m_unspec_for_sint,
> e.vector_mode (0));
> + break;
> +
> +   case MODE_n:
> + /* No predicate, _n suffix.  */
> + if (e.type_suffix (0).unsigned_p)
> +   code = code_for_mve_q_n (m_unspec_for_n_uint,
> m_unspec_for_n_uint, e.vector_mode (0));
> + else
> +   code = code_for_mve_q_n (m_unspec_for_n_sint,
> m_unspec_for_n_sint, e.vector_mode (0));
> + break;
> +
> +   case MODE_r:
> + /* No predicate, _r suffix.  */
> + if (e.type_suffix (0).unsigned_p)
> +   code = code_for_mve_q_r (m_unspec_for_r_uint,
> m_unspec_for_r_uint, e.vector_mode (0));
> + else
> +   code = code_for_mve_q_r (m_unspec_for_r_sint,
> m_unspec_for_r_sint, e.vector_mode (0));
> + break;
> +
> +   default:
> + gcc_unreachable ();
> +   }
> + return e.use_exact_insn (code);
> +
> +  case PRED_m:
> + switch (e.mode_suffix_id)
> +   {
> +   case MODE_none:
> + /* No suffix, "m" predicate.  */
> + if (e.type_suffix (0).unsigned_p)
> +   code = code_for_mve_q_m (m_unspec_for_m_uint,
> m_unspec_for_m

RE: [PATCH 12/23] arm: [MVE intrinsics] rework vqshlq vshlq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 12/23] arm: [MVE intrinsics] rework vqshlq vshlq
> 
> Implement vqshlq, vshlq using the new MVE builtins framework.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_R):
> New.
>   (vqshlq, vshlq): New.
>   * config/arm/arm-mve-builtins-base.def (vqshlq, vshlq): New.
>   * config/arm/arm-mve-builtins-base.h (vqshlq, vshlq): New.
>   * config/arm/arm_mve.h (vshlq): Remove.
>   (vshlq_r): Remove.
>   (vshlq_n): Remove.
>   (vshlq_m_r): Remove.
>   (vshlq_m): Remove.
>   (vshlq_m_n): Remove.
>   (vshlq_x): Remove.
>   (vshlq_x_n): Remove.
>   (vshlq_s8): Remove.
>   (vshlq_s16): Remove.
>   (vshlq_s32): Remove.
>   (vshlq_u8): Remove.
>   (vshlq_u16): Remove.
>   (vshlq_u32): Remove.
>   (vshlq_r_u8): Remove.
>   (vshlq_n_u8): Remove.
>   (vshlq_r_s8): Remove.
>   (vshlq_n_s8): Remove.
>   (vshlq_r_u16): Remove.
>   (vshlq_n_u16): Remove.
>   (vshlq_r_s16): Remove.
>   (vshlq_n_s16): Remove.
>   (vshlq_r_u32): Remove.
>   (vshlq_n_u32): Remove.
>   (vshlq_r_s32): Remove.
>   (vshlq_n_s32): Remove.
>   (vshlq_m_r_u8): Remove.
>   (vshlq_m_r_s8): Remove.
>   (vshlq_m_r_u16): Remove.
>   (vshlq_m_r_s16): Remove.
>   (vshlq_m_r_u32): Remove.
>   (vshlq_m_r_s32): Remove.
>   (vshlq_m_u8): Remove.
>   (vshlq_m_s8): Remove.
>   (vshlq_m_u16): Remove.
>   (vshlq_m_s16): Remove.
>   (vshlq_m_u32): Remove.
>   (vshlq_m_s32): Remove.
>   (vshlq_m_n_s8): Remove.
>   (vshlq_m_n_s32): Remove.
>   (vshlq_m_n_s16): Remove.
>   (vshlq_m_n_u8): Remove.
>   (vshlq_m_n_u32): Remove.
>   (vshlq_m_n_u16): Remove.
>   (vshlq_x_s8): Remove.
>   (vshlq_x_s16): Remove.
>   (vshlq_x_s32): Remove.
>   (vshlq_x_u8): Remove.
>   (vshlq_x_u16): Remove.
>   (vshlq_x_u32): Remove.
>   (vshlq_x_n_s8): Remove.
>   (vshlq_x_n_s16): Remove.
>   (vshlq_x_n_s32): Remove.
>   (vshlq_x_n_u8): Remove.
>   (vshlq_x_n_u16): Remove.
>   (vshlq_x_n_u32): Remove.
>   (__arm_vshlq_s8): Remove.
>   (__arm_vshlq_s16): Remove.
>   (__arm_vshlq_s32): Remove.
>   (__arm_vshlq_u8): Remove.
>   (__arm_vshlq_u16): Remove.
>   (__arm_vshlq_u32): Remove.
>   (__arm_vshlq_r_u8): Remove.
>   (__arm_vshlq_n_u8): Remove.
>   (__arm_vshlq_r_s8): Remove.
>   (__arm_vshlq_n_s8): Remove.
>   (__arm_vshlq_r_u16): Remove.
>   (__arm_vshlq_n_u16): Remove.
>   (__arm_vshlq_r_s16): Remove.
>   (__arm_vshlq_n_s16): Remove.
>   (__arm_vshlq_r_u32): Remove.
>   (__arm_vshlq_n_u32): Remove.
>   (__arm_vshlq_r_s32): Remove.
>   (__arm_vshlq_n_s32): Remove.
>   (__arm_vshlq_m_r_u8): Remove.
>   (__arm_vshlq_m_r_s8): Remove.
>   (__arm_vshlq_m_r_u16): Remove.
>   (__arm_vshlq_m_r_s16): Remove.
>   (__arm_vshlq_m_r_u32): Remove.
>   (__arm_vshlq_m_r_s32): Remove.
>   (__arm_vshlq_m_u8): Remove.
>   (__arm_vshlq_m_s8): Remove.
>   (__arm_vshlq_m_u16): Remove.
>   (__arm_vshlq_m_s16): Remove.
>   (__arm_vshlq_m_u32): Remove.
>   (__arm_vshlq_m_s32): Remove.
>   (__arm_vshlq_m_n_s8): Remove.
>   (__arm_vshlq_m_n_s32): Remove.
>   (__arm_vshlq_m_n_s16): Remove.
>   (__arm_vshlq_m_n_u8): Remove.
>   (__arm_vshlq_m_n_u32): Remove.
>   (__arm_vshlq_m_n_u16): Remove.
>   (__arm_vshlq_x_s8): Remove.
>   (__arm_vshlq_x_s16): Remove.
>   (__arm_vshlq_x_s32): Remove.
>   (__arm_vshlq_x_u8): Remove.
>   (__arm_vshlq_x_u16): Remove.
>   (__arm_vshlq_x_u32): Remove.
>   (__arm_vshlq_x_n_s8): Remove.
>   (__arm_vshlq_x_n_s16): Remove.
>   (__arm_vshlq_x_n_s32): Remove.
>   (__arm_vshlq_x_n_u8): Remove.
>   (__arm_vshlq_x_n_u16): Remove.
>   (__arm_vshlq_x_n_u32): Remove.
>   (__arm_vshlq): Remove.
>   (__arm_vshlq_r): Remove.
>   (__arm_vshlq_n): Remove.
>   (__arm_vshlq_m_r): Remove.
>   (__arm_vshlq_m): Remove.
>   (__arm_vshlq_m_n): Remove.
>   (__arm_vshlq_x): Remove.
>   (__arm_vshlq_x_n): Remove.
>   (vqshlq): Remove.
>   (vqshlq_r): Remove.
>   (vqshlq_n): Remove.
>   (vqshlq_m_r): Remove.
>   (vqshlq_m_n): Remove.
>   (vqshlq_m): Remove.
>   (vqshlq_u8): Remove.
>   (vqshlq_r_u8): Remove.
>   (vqshlq_n_u8): Remove.
>   (vqshlq_s8): Remove.
>   (vqshlq_r_s8): Remove.
>   (vqshlq_n_s8): Remove.
>   (vqshlq_u16): Remove.
>   (vqshlq_r_u16): Remove.
>   (vqshlq_n_u16): Remove.
>   (vqshlq_s16): Remove.
>   (vqshlq_r_s16): Remove.
>   (vqshlq_n_s16): Remove.
>   (vqshl

RE: [PATCH 13/23] arm: [MVE intrinsics] factorize vmaxq vminq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 13/23] arm: [MVE intrinsics] factorize vmaxq vminq
> 
> Factorize vmaxq and vminq so that they use the same pattern.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MAX_MIN_SU): New.
>   (max_min_su_str): New.
>   (max_min_supf): New.
>   * config/arm/mve.md (mve_vmaxq_s,
> mve_vmaxq_u)
>   (mve_vminq_s, mve_vminq_u): Merge into ...
>   (mve_q_): ... this.
> ---
>  gcc/config/arm/iterators.md | 11 ++
>  gcc/config/arm/mve.md   | 44 +
>  2 files changed, 16 insertions(+), 39 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 3133642ea82..9ff61e0573b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -330,6 +330,9 @@ (define_code_iterator FCVT [unsigned_float float])
>  ;; Saturating addition, subtraction
>  (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
> 
> +;; Max/Min iterator, to factorize MVE patterns
> +(define_code_iterator MAX_MIN_SU [smax umax smin umin])
> +
>  ;; MVE integer binary operations.
>  (define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
> 
> @@ -1271,6 +1274,14 @@ (define_code_attr float_sup [(unsigned_float "u")
> (float "s")])
> 
>  (define_code_attr float_SUP [(unsigned_float "U") (float "S")])
> 
> +;; max/min for MVE
> +(define_code_attr max_min_su_str [(smax "vmax") (umax "vmax") (smin
> "vmin") (umin "vmin")])
> +
> +(define_code_attr max_min_supf [
> +  (smax "s") (umax "u")
> +  (smin "s") (umin "u")
> +  ])
> +
>  
> ;;
>  ;; Int attributes
>  
> ;;
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index c8cb4e430ac..44409b40e5f 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1106,29 +1106,20 @@ (define_insn "mve_vmaxavq_s"
>  ])
> 
>  ;;
> -;; [vmaxq_u, vmaxq_s])
> +;; [vmaxq_u, vmaxq_s]
> +;; [vminq_s, vminq_u]
>  ;;
> -(define_insn "mve_vmaxq_s"
> +(define_insn "mve_q_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> + (MAX_MIN_SU:MVE_2 (match_operand:MVE_2 1
> "s_register_operand" "w")
>   (match_operand:MVE_2 2 "s_register_operand" "w")))
>]
>"TARGET_HAVE_MVE"
> -  "vmax.%#\t%q0, %q1, %q2"
> +  ".%#\t%q0, %q1, %q2"
>[(set_attr "type" "mve_move")
>  ])
> 
> -(define_insn "mve_vmaxq_u"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> - (match_operand:MVE_2 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vmax.%#\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> 
>  ;;
>  ;; [vmaxvq_u, vmaxvq_s])
> @@ -1175,31 +1166,6 @@ (define_insn "mve_vminavq_s"
>[(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vminq_s, vminq_u])
> -;;
> -(define_insn "mve_vminq_s"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> - (match_operand:MVE_2 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vmin.%#\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -(define_insn "mve_vminq_u"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> - (match_operand:MVE_2 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vmin.%#\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vminvq_u, vminvq_s])
>  ;;
> --
> 2.34.1

RE: [PATCH 14/23] arm: [MVE intrinsics] rework vmaxq vminq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 14/23] arm: [MVE intrinsics] rework vmaxq vminq
> 
> Implement vmaxq and vminq using the new MVE builtins framework.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc
> (FUNCTION_WITH_RTX_M_NO_F): New.
>   (vmaxq, vminq): New.
>   * config/arm/arm-mve-builtins-base.def (vmaxq, vminq): New.
>   * config/arm/arm-mve-builtins-base.h (vmaxq, vminq): New.
>   * config/arm/arm_mve.h (vminq): Remove.
>   (vmaxq): Remove.
>   (vmaxq_m): Remove.
>   (vminq_m): Remove.
>   (vminq_x): Remove.
>   (vmaxq_x): Remove.
>   (vminq_u8): Remove.
>   (vmaxq_u8): Remove.
>   (vminq_s8): Remove.
>   (vmaxq_s8): Remove.
>   (vminq_u16): Remove.
>   (vmaxq_u16): Remove.
>   (vminq_s16): Remove.
>   (vmaxq_s16): Remove.
>   (vminq_u32): Remove.
>   (vmaxq_u32): Remove.
>   (vminq_s32): Remove.
>   (vmaxq_s32): Remove.
>   (vmaxq_m_s8): Remove.
>   (vmaxq_m_s32): Remove.
>   (vmaxq_m_s16): Remove.
>   (vmaxq_m_u8): Remove.
>   (vmaxq_m_u32): Remove.
>   (vmaxq_m_u16): Remove.
>   (vminq_m_s8): Remove.
>   (vminq_m_s32): Remove.
>   (vminq_m_s16): Remove.
>   (vminq_m_u8): Remove.
>   (vminq_m_u32): Remove.
>   (vminq_m_u16): Remove.
>   (vminq_x_s8): Remove.
>   (vminq_x_s16): Remove.
>   (vminq_x_s32): Remove.
>   (vminq_x_u8): Remove.
>   (vminq_x_u16): Remove.
>   (vminq_x_u32): Remove.
>   (vmaxq_x_s8): Remove.
>   (vmaxq_x_s16): Remove.
>   (vmaxq_x_s32): Remove.
>   (vmaxq_x_u8): Remove.
>   (vmaxq_x_u16): Remove.
>   (vmaxq_x_u32): Remove.
>   (__arm_vminq_u8): Remove.
>   (__arm_vmaxq_u8): Remove.
>   (__arm_vminq_s8): Remove.
>   (__arm_vmaxq_s8): Remove.
>   (__arm_vminq_u16): Remove.
>   (__arm_vmaxq_u16): Remove.
>   (__arm_vminq_s16): Remove.
>   (__arm_vmaxq_s16): Remove.
>   (__arm_vminq_u32): Remove.
>   (__arm_vmaxq_u32): Remove.
>   (__arm_vminq_s32): Remove.
>   (__arm_vmaxq_s32): Remove.
>   (__arm_vmaxq_m_s8): Remove.
>   (__arm_vmaxq_m_s32): Remove.
>   (__arm_vmaxq_m_s16): Remove.
>   (__arm_vmaxq_m_u8): Remove.
>   (__arm_vmaxq_m_u32): Remove.
>   (__arm_vmaxq_m_u16): Remove.
>   (__arm_vminq_m_s8): Remove.
>   (__arm_vminq_m_s32): Remove.
>   (__arm_vminq_m_s16): Remove.
>   (__arm_vminq_m_u8): Remove.
>   (__arm_vminq_m_u32): Remove.
>   (__arm_vminq_m_u16): Remove.
>   (__arm_vminq_x_s8): Remove.
>   (__arm_vminq_x_s16): Remove.
>   (__arm_vminq_x_s32): Remove.
>   (__arm_vminq_x_u8): Remove.
>   (__arm_vminq_x_u16): Remove.
>   (__arm_vminq_x_u32): Remove.
>   (__arm_vmaxq_x_s8): Remove.
>   (__arm_vmaxq_x_s16): Remove.
>   (__arm_vmaxq_x_s32): Remove.
>   (__arm_vmaxq_x_u8): Remove.
>   (__arm_vmaxq_x_u16): Remove.
>   (__arm_vmaxq_x_u32): Remove.
>   (__arm_vminq): Remove.
>   (__arm_vmaxq): Remove.
>   (__arm_vmaxq_m): Remove.
>   (__arm_vminq_m): Remove.
>   (__arm_vminq_x): Remove.
>   (__arm_vmaxq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |  11 +
>  gcc/config/arm/arm-mve-builtins-base.def |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   2 +
>  gcc/config/arm/arm_mve.h | 628 ---
>  4 files changed, 15 insertions(+), 628 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 4bebf86f784..1839d5cb1a5 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -110,6 +110,15 @@ namespace arm_mve {
>  UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
>   \
>  UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
> 
> +  /* Helper for builtins with RTX codes, _m predicated override, but
> + no floating-point versions.  */
> +#define FUNCTION_WITH_RTX_M_NO_F(NAME, RTX_S, RTX_U, UNSPEC)
> FUNCTION  \
> +  (NAME, unspec_based_mve_function_exact_insn,
>   \
> +   (RTX_S, RTX_U, UNKNOWN,   \
> +-1, -1, -1,  
> \
> +UNSPEC##_M_S, UNSPEC##_M_U, -1,
>   \
> +-1, -1, -1))
> +
>/* Helper for builtins without RTX codes, no _m predicated and no _n
>   overrides.  */
>  #define FUNCTION_WITHOUT_M_N(NAME, UNSPEC) FUNCTION
>   \
> @@ -173,6 +182,8 @@ FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
>  FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
>  FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
>  FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
> +FUNCTION_WITH_RTX_M_NO_F (vmaxq, SMAX, UMAX, VMAXQ)
> +FUNCT

RE: [PATCH 15/23] arm: [MVE intrinsics] add binary_rshift_narrow shape

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 15/23] arm: [MVE intrinsics] add binary_rshift_narrow shape
> 
> This patch adds the binary_rshift_narrow shape description.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_rshift_narrow):
>   New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_rshift_narrow):
> New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 47 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 48 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 4ecb612ece5..88934e1ca15 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -617,6 +617,53 @@ struct binary_lshift_r_def : public
> overloaded_base<0>
>  };
>  SHAPE (binary_lshift_r)
> 
> +/* _t vfoo[_n_t0](_t, _t, const int)
> +
> +   Narrowing right shifts.
> +   Check that 'imm' is in the [1..#bits/2] range.
> +
> +   Example: vqrshrnbq.
> +   int8x16_t [__arm_]vqrshrnbq[_n_s16](int8x16_t a, int16x8_t b, const int
> imm)
> +   int8x16_t [__arm_]vqrshrnbq_m[_n_s16](int8x16_t a, int16x8_t b, const int
> imm, mve_pred16_t p)  */
> +struct binary_rshift_narrow_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +build_all (b, "vh0,vh0,v0,ss32", group, MODE_n,
> preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (3, i, nargs)
> + || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
> + || !r.require_integer_immediate (i))
> +  return error_mark_node;
> +
> +type_suffix_index narrow_suffix
> +  = find_type_suffix (type_suffixes[type].tclass,
> +   type_suffixes[type].element_bits / 2);
> +
> +if (!r.require_matching_vector_type (0, narrow_suffix))
> +  return error_mark_node;
> +
> +return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +
> +  bool
> +  check (function_checker &c) const override
> +  {
> +unsigned int bits = c.type_suffix (0).element_bits;
> +return c.require_immediate_range (2, 1, bits / 2);
> +  }
> +};
> +SHAPE (binary_rshift_narrow)
> +
>  /* xN_t vfoo[_t0](uint64_t, uint64_t)
> 
> where there are N arguments in total.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 25d9b60a670..d72686d187b 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -40,6 +40,7 @@ namespace arm_mve
>  extern const function_shape *const binary_opt_n;
>  extern const function_shape *const binary_orrq;
>  extern const function_shape *const binary_round_lshift;
> +extern const function_shape *const binary_rshift_narrow;
>  extern const function_shape *const create;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary_convert;
> --
> 2.34.1

RE: [PATCH 16/23] arm: [MVE intrinsics] factorize vshrntq vshrnbq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 16/23] arm: [MVE intrinsics] factorize vshrntq vshrnbq
> vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq
> 
> Factorize vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq, vshrntq, vshrnbq,
> vrshrnbq and vrshrntq so that they use the same pattern.
> 
> Introduce  iterator for *shrn* so that we can use the same
> pattern despite the different "s", "u" and "i" suffixes.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MVE_SHRN_N, MVE_SHRN_M_N): New.
>   (mve_insn): Add vqrshrnb, vqrshrnt, vqshrnb, vqshrnt, vrshrnb,
>   vrshrnt, vshrnb, vshrnt.
>   (isu): New.
>   * config/arm/mve.md (mve_vqrshrnbq_n_)
>   (mve_vqrshrntq_n_,
> mve_vqshrnbq_n_)
>   (mve_vqshrntq_n_, mve_vrshrnbq_n_)
>   (mve_vrshrntq_n_, mve_vshrnbq_n_)
>   (mve_vshrntq_n_): Merge into ...
>   (@mve_q_n_): ... this.
>   (mve_vqrshrnbq_m_n_,
> mve_vqrshrntq_m_n_)
>   (mve_vqshrnbq_m_n_,
> mve_vqshrntq_m_n_)
>   (mve_vrshrnbq_m_n_,
> mve_vrshrntq_m_n_)
>   (mve_vshrnbq_m_n_,
> mve_vshrntq_m_n_):
>   Merge into ...
>   (@mve_q_m_n_): ... this.
> ---
>  gcc/config/arm/iterators.md |  57 
>  gcc/config/arm/mve.md   | 270 
>  2 files changed, 85 insertions(+), 242 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 9ff61e0573b..d64c924a513 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -468,6 +468,28 @@ (define_int_iterator MVE_RSHIFT_N   [
>VRSHLQ_N_S VRSHLQ_N_U
>])
> 
> +(define_int_iterator MVE_SHRN_N [
> +  VQRSHRNBQ_N_S VQRSHRNBQ_N_U
> +  VQRSHRNTQ_N_S VQRSHRNTQ_N_U
> +  VQSHRNBQ_N_S VQSHRNBQ_N_U
> +  VQSHRNTQ_N_S VQSHRNTQ_N_U
> +  VRSHRNBQ_N_S VRSHRNBQ_N_U
> +  VRSHRNTQ_N_S VRSHRNTQ_N_U
> +  VSHRNBQ_N_S VSHRNBQ_N_U
> +  VSHRNTQ_N_S VSHRNTQ_N_U
> +  ])
> +
> +(define_int_iterator MVE_SHRN_M_N [
> +  VQRSHRNBQ_M_N_S VQRSHRNBQ_M_N_U
> +  VQRSHRNTQ_M_N_S VQRSHRNTQ_M_N_U
> +  VQSHRNBQ_M_N_S VQSHRNBQ_M_N_U
> +  VQSHRNTQ_M_N_S VQSHRNTQ_M_N_U
> +  VRSHRNBQ_M_N_S VRSHRNBQ_M_N_U
> +  VRSHRNTQ_M_N_S VRSHRNTQ_M_N_U
> +  VSHRNBQ_M_N_S VSHRNBQ_M_N_U
> +  VSHRNTQ_M_N_S VSHRNTQ_M_N_U
> +  ])
> +
>  (define_int_iterator MVE_FP_M_BINARY   [
>VABDQ_M_F
>VADDQ_M_F
> @@ -568,12 +590,20 @@ (define_int_attr mve_insn [
>(VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
>(VQRSHLQ_N_S "vqrshl") (VQRSHLQ_N_U "vqrshl")
>(VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
> +  (VQRSHRNBQ_M_N_S "vqrshrnb") (VQRSHRNBQ_M_N_U
> "vqrshrnb")
> +  (VQRSHRNBQ_N_S "vqrshrnb") (VQRSHRNBQ_N_U
> "vqrshrnb")
> +  (VQRSHRNTQ_M_N_S "vqrshrnt") (VQRSHRNTQ_M_N_U
> "vqrshrnt")
> +  (VQRSHRNTQ_N_S "vqrshrnt") (VQRSHRNTQ_N_U "vqrshrnt")
>(VQSHLQ_M_N_S "vqshl") (VQSHLQ_M_N_U "vqshl")
>(VQSHLQ_M_R_S "vqshl") (VQSHLQ_M_R_U "vqshl")
>(VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
>(VQSHLQ_N_S "vqshl") (VQSHLQ_N_U "vqshl")
>(VQSHLQ_R_S "vqshl") (VQSHLQ_R_U "vqshl")
>(VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
> +  (VQSHRNBQ_M_N_S "vqshrnb") (VQSHRNBQ_M_N_U
> "vqshrnb")
> +  (VQSHRNBQ_N_S "vqshrnb") (VQSHRNBQ_N_U "vqshrnb")
> +  (VQSHRNTQ_M_N_S "vqshrnt") (VQSHRNTQ_M_N_U
> "vqshrnt")
> +  (VQSHRNTQ_N_S "vqshrnt") (VQSHRNTQ_N_U "vqshrnt")
>(VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
>(VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
>(VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
> @@ -586,17 +616,44 @@ (define_int_attr mve_insn [
>(VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
>(VRSHLQ_N_S "vrshl") (VRSHLQ_N_U "vrshl")
>(VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
> +  (VRSHRNBQ_M_N_S "vrshrnb") (VRSHRNBQ_M_N_U
> "vrshrnb")
> +  (VRSHRNBQ_N_S "vrshrnb") (VRSHRNBQ_N_U "vrshrnb")
> +  (VRSHRNTQ_M_N_S "vrshrnt") (VRSHRNTQ_M_N_U
> "vrshrnt")
> +  (VRSHRNTQ_N_S "vrshrnt") (VRSHRNTQ_N_U "vrshrnt")
>(VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
>(VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
>(VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
>(VSHLQ_N_S "vshl") (VSHLQ_N_U "vshl")
>(VSHLQ_R_S "vs

Re: [libstdc++] use strtold for from_chars even without locale

2023-05-05 Thread Jonathan Wakely via Gcc-patches

On Fri, 5 May 2023 at 11:39, Alexandre Oliva  wrote:

> Here's a patch to skip/xfail the bits that are expected to fail on
> aarch64-vxworks.
>

OK for trunk and gcc-13, thanks.


>
>
> [libstdc++] [testsuite] xfail double-prec from_chars for ldbl
>
> When long double is wider than double, but from_chars is implemented
> in terms of double, tests that involve the full precision of long
> double are expected to fail.  Mark them as such on aarch64-*-vxworks.
>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/20_util/from_chars/4.cc: Skip long double test06
> on aarch64-vxworks.
> * testsuite/20_util/to_chars/long_double.cc: Xfail run on
> aarch64-vxworks.
> ---
>  libstdc++-v3/testsuite/20_util/from_chars/4.cc |3 ++-
>  .../testsuite/20_util/to_chars/long_double.cc  |4 
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> index dd55690eb6511..c3594f9014bd3 100644
> --- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> +++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> @@ -18,6 +18,7 @@
>  //  is supported in C++14 as a GNU extension
>  // { dg-do run { target c++14 } }
>  // { dg-add-options ieee }
> +// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target
> aarch64-*-vxworks* } }
>
>  #include 
>  #include 
> @@ -354,7 +355,7 @@ test06()
>  {
>test_max_mantissa();
>test_max_mantissa();
> -#ifdef __GLIBCXX_TYPE_INT_N_0
> +#if defined __GLIBCXX_TYPE_INT_N_0 && !defined SKIP_LONG_DOUBLE
>test_max_mantissa();
>  #endif
>  }
> diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> index 880c98021876d..263144bd42cba 100644
> --- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> +++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> @@ -34,6 +34,10 @@
>  // more portable and robust to differences in system printf behavior.
>  // { dg-xfail-run-if "Non-conforming printf (see PR98384)" { *-*-solaris*
> *-*-darwin* } }
>
> +// On systems that use double-precision from_chars for long double,
> +// this is expected to fail.
> +// { dg-xfail-run-if "from_chars limited to double-precision" {
> aarch64-*-vxworks* } }
> +
>  // { dg-require-effective-target ieee_floats }
>  // { dg-require-effective-target size32plus }
>  // { dg-require-cmath "" }
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 
>
>

RE: [PATCH 17/23] arm: [MVE intrinsics] rework vshrnbq vshrntq vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 17/23] arm: [MVE intrinsics] rework vshrnbq vshrntq
> vrshrnbq vrshrntq vqshrnbq vqshrntq vqrshrnbq vqrshrntq
> 
> Implement vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq,
> vqrshrnbq, vqrshrntq using the new MVE builtins framework.

Ok with a style nit...

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N_NO_F):
> New.
>   (vshrnbq, vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq)
>   (vqrshrnbq, vqrshrntq): New.
>   * config/arm/arm-mve-builtins-base.def (vshrnbq, vshrntq)
>   (vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq):
>   New.
>   * config/arm/arm-mve-builtins-base.h (vshrnbq, vshrntq, vrshrnbq)
>   (vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq, vqrshrntq): New.
>   * config/arm/arm-mve-builtins.cc
>   (function_instance::has_inactive_argument): Handle vshrnbq,
>   vshrntq, vrshrnbq, vrshrntq, vqshrnbq, vqshrntq, vqrshrnbq,
>   vqrshrntq.
>   * config/arm/arm_mve.h (vshrnbq): Remove.
>   (vshrntq): Remove.
>   (vshrnbq_m): Remove.
>   (vshrntq_m): Remove.
>   (vshrnbq_n_s16): Remove.
>   (vshrntq_n_s16): Remove.
>   (vshrnbq_n_u16): Remove.
>   (vshrntq_n_u16): Remove.
>   (vshrnbq_n_s32): Remove.
>   (vshrntq_n_s32): Remove.
>   (vshrnbq_n_u32): Remove.
>   (vshrntq_n_u32): Remove.
>   (vshrnbq_m_n_s32): Remove.
>   (vshrnbq_m_n_s16): Remove.
>   (vshrnbq_m_n_u32): Remove.
>   (vshrnbq_m_n_u16): Remove.
>   (vshrntq_m_n_s32): Remove.
>   (vshrntq_m_n_s16): Remove.
>   (vshrntq_m_n_u32): Remove.
>   (vshrntq_m_n_u16): Remove.
>   (__arm_vshrnbq_n_s16): Remove.
>   (__arm_vshrntq_n_s16): Remove.
>   (__arm_vshrnbq_n_u16): Remove.
>   (__arm_vshrntq_n_u16): Remove.
>   (__arm_vshrnbq_n_s32): Remove.
>   (__arm_vshrntq_n_s32): Remove.
>   (__arm_vshrnbq_n_u32): Remove.
>   (__arm_vshrntq_n_u32): Remove.
>   (__arm_vshrnbq_m_n_s32): Remove.
>   (__arm_vshrnbq_m_n_s16): Remove.
>   (__arm_vshrnbq_m_n_u32): Remove.
>   (__arm_vshrnbq_m_n_u16): Remove.
>   (__arm_vshrntq_m_n_s32): Remove.
>   (__arm_vshrntq_m_n_s16): Remove.
>   (__arm_vshrntq_m_n_u32): Remove.
>   (__arm_vshrntq_m_n_u16): Remove.
>   (__arm_vshrnbq): Remove.
>   (__arm_vshrntq): Remove.
>   (__arm_vshrnbq_m): Remove.
>   (__arm_vshrntq_m): Remove.
>   (vrshrnbq): Remove.
>   (vrshrntq): Remove.
>   (vrshrnbq_m): Remove.
>   (vrshrntq_m): Remove.
>   (vrshrnbq_n_s16): Remove.
>   (vrshrntq_n_s16): Remove.
>   (vrshrnbq_n_u16): Remove.
>   (vrshrntq_n_u16): Remove.
>   (vrshrnbq_n_s32): Remove.
>   (vrshrntq_n_s32): Remove.
>   (vrshrnbq_n_u32): Remove.
>   (vrshrntq_n_u32): Remove.
>   (vrshrnbq_m_n_s32): Remove.
>   (vrshrnbq_m_n_s16): Remove.
>   (vrshrnbq_m_n_u32): Remove.
>   (vrshrnbq_m_n_u16): Remove.
>   (vrshrntq_m_n_s32): Remove.
>   (vrshrntq_m_n_s16): Remove.
>   (vrshrntq_m_n_u32): Remove.
>   (vrshrntq_m_n_u16): Remove.
>   (__arm_vrshrnbq_n_s16): Remove.
>   (__arm_vrshrntq_n_s16): Remove.
>   (__arm_vrshrnbq_n_u16): Remove.
>   (__arm_vrshrntq_n_u16): Remove.
>   (__arm_vrshrnbq_n_s32): Remove.
>   (__arm_vrshrntq_n_s32): Remove.
>   (__arm_vrshrnbq_n_u32): Remove.
>   (__arm_vrshrntq_n_u32): Remove.
>   (__arm_vrshrnbq_m_n_s32): Remove.
>   (__arm_vrshrnbq_m_n_s16): Remove.
>   (__arm_vrshrnbq_m_n_u32): Remove.
>   (__arm_vrshrnbq_m_n_u16): Remove.
>   (__arm_vrshrntq_m_n_s32): Remove.
>   (__arm_vrshrntq_m_n_s16): Remove.
>   (__arm_vrshrntq_m_n_u32): Remove.
>   (__arm_vrshrntq_m_n_u16): Remove.
>   (__arm_vrshrnbq): Remove.
>   (__arm_vrshrntq): Remove.
>   (__arm_vrshrnbq_m): Remove.
>   (__arm_vrshrntq_m): Remove.
>   (vqshrnbq): Remove.
>   (vqshrntq): Remove.
>   (vqshrnbq_m): Remove.
>   (vqshrntq_m): Remove.
>   (vqshrnbq_n_s16): Remove.
>   (vqshrntq_n_s16): Remove.
>   (vqshrnbq_n_u16): Remove.
>   (vqshrntq_n_u16): Remove.
>   (vqshrnbq_n_s32): Remove.
>   (vqshrntq_n_s32): Remove.
>   (vqshrnbq_n_u32): Remove.
>   (vqshrntq_n_u32): Remove.
>   (vqshrnbq_m_n_s32): Remove.
>   (vqshrnbq_m_n_s16): Remove.
>   (vqshrnbq_m_n_u32): Remove.
>   (vqshrnbq_m_n_u16): Remove.
>   (vqshrntq_m_n_s32): Remove.
>   (vqshrntq_m_n_s16): Remove.
>   (vqshrntq_m_n_u32): Remove.
>   (vqshrntq_m_n_u16): Remove.
>   (__arm_vqshrnbq_n_s16): Remove.
>   (__arm_vqshrntq_n_s16): Remove.
>   (__arm_vqshrnbq_n_u16): Remove.
>   (__arm_vqshrntq_n_u16): Remove.
>   (__arm_vqshrnbq

RE: [PATCH 18/23] arm: [MVE intrinsics] add binary_rshift_narrow_unsigned shape

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 18/23] arm: [MVE intrinsics] add
> binary_rshift_narrow_unsigned shape
> 
> This patch adds the binary_rshift_narrow_unsigned shape description.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc
>   (binary_rshift_narrow_unsigned): New.
>   * config/arm/arm-mve-builtins-shapes.h
>   (binary_rshift_narrow_unsigned): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 48 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 49 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 88934e1ca15..e3bf586565c 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -664,6 +664,54 @@ struct binary_rshift_narrow_def : public
> overloaded_base<0>
>  };
>  SHAPE (binary_rshift_narrow)
> 
> +/* _t vfoo[_n_t0](_t, _t, const int)
> +
> +   Vector saturating rounding shift right and narrow.
> +   Check that 'imm' is in the [1..#bits/2] range.
> +
> +   Example: vqshrunbq.
> +   uint8x16_t [__arm_]vqshrunbq[_n_s16](uint8x16_t a, int16x8_t b, const int
> imm)
> +   uint8x16_t [__arm_]vqshrunbq_m[_n_s16](uint8x16_t a, int16x8_t b,
> const int imm, mve_pred16_t p)  */
> +struct binary_rshift_narrow_unsigned_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +build_all (b, "vhu0,vhu0,v0,ss32", group, MODE_n,
> preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (3, i, nargs)
> + || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
> + || !r.require_integer_immediate (i))
> +  return error_mark_node;
> +
> +type_suffix_index narrow_suffix
> +  = find_type_suffix (TYPE_unsigned,
> +   type_suffixes[type].element_bits / 2);
> +
> +if (!r.require_matching_vector_type (0, narrow_suffix))
> +  return error_mark_node;
> +
> +return r.resolve_to (r.mode_suffix_id, type);
> +  }
> +
> +  bool
> +  check (function_checker &c) const override
> +  {
> +unsigned int bits = c.type_suffix (0).element_bits;
> +return c.require_immediate_range (2, 1, bits / 2);
> +  }
> +
> +};
> +SHAPE (binary_rshift_narrow_unsigned)
> +
>  /* xN_t vfoo[_t0](uint64_t, uint64_t)
> 
> where there are N arguments in total.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index d72686d187b..ca1c1017e8e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -41,6 +41,7 @@ namespace arm_mve
>  extern const function_shape *const binary_orrq;
>  extern const function_shape *const binary_round_lshift;
>  extern const function_shape *const binary_rshift_narrow;
> +extern const function_shape *const binary_rshift_narrow_unsigned;
>  extern const function_shape *const create;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary_convert;
> --
> 2.34.1

RE: [PATCH 19/23] arm: [MVE intrinsics] factorize vqrshrunb vqrshrunt vqshrunb vqshrunt

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 19/23] arm: [MVE intrinsics] factorize vqrshrunb vqrshrunt
> vqshrunb vqshrunt
> 
> Factorize vqrshrunb, vqrshrunt, vqshrunb, vqshrunt so that they use
> existing patterns.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (MVE_SHRN_N): Add VQRSHRUNBQ,
>   VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
>   (MVE_SHRN_M_N): Likewise.
>   (mve_insn): Add vqrshrunb, vqrshrunt, vqshrunb, vqshrunt.
>   (isu): Add VQRSHRUNBQ, VQRSHRUNTQ, VQSHRUNBQ, VQSHRUNTQ.
>   (supf): Likewise.
>   * config/arm/mve.md (mve_vqrshrunbq_n_s): Remove.
>   (mve_vqrshruntq_n_s): Remove.
>   (mve_vqshrunbq_n_s): Remove.
>   (mve_vqshruntq_n_s): Remove.
>   (mve_vqrshrunbq_m_n_s): Remove.
>   (mve_vqrshruntq_m_n_s): Remove.
>   (mve_vqshrunbq_m_n_s): Remove.
>   (mve_vqshruntq_m_n_s): Remove.
> ---
>  gcc/config/arm/iterators.md |  32 +
>  gcc/config/arm/mve.md   | 140 +++-
>  2 files changed, 40 insertions(+), 132 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index d64c924a513..583206dac9e 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -471,8 +471,12 @@ (define_int_iterator MVE_RSHIFT_N   [
>  (define_int_iterator MVE_SHRN_N [
>VQRSHRNBQ_N_S VQRSHRNBQ_N_U
>VQRSHRNTQ_N_S VQRSHRNTQ_N_U
> +  VQRSHRUNBQ_N_S
> +  VQRSHRUNTQ_N_S
>VQSHRNBQ_N_S VQSHRNBQ_N_U
>VQSHRNTQ_N_S VQSHRNTQ_N_U
> +  VQSHRUNBQ_N_S
> +  VQSHRUNTQ_N_S
>VRSHRNBQ_N_S VRSHRNBQ_N_U
>VRSHRNTQ_N_S VRSHRNTQ_N_U
>VSHRNBQ_N_S VSHRNBQ_N_U
> @@ -482,8 +486,12 @@ (define_int_iterator MVE_SHRN_N [
>  (define_int_iterator MVE_SHRN_M_N [
>VQRSHRNBQ_M_N_S VQRSHRNBQ_M_N_U
>VQRSHRNTQ_M_N_S VQRSHRNTQ_M_N_U
> +  VQRSHRUNBQ_M_N_S
> +  VQRSHRUNTQ_M_N_S
>VQSHRNBQ_M_N_S VQSHRNBQ_M_N_U
>VQSHRNTQ_M_N_S VQSHRNTQ_M_N_U
> +  VQSHRUNBQ_M_N_S
> +  VQSHRUNTQ_M_N_S
>VRSHRNBQ_M_N_S VRSHRNBQ_M_N_U
>VRSHRNTQ_M_N_S VRSHRNTQ_M_N_U
>VSHRNBQ_M_N_S VSHRNBQ_M_N_U
> @@ -594,6 +602,10 @@ (define_int_attr mve_insn [
>(VQRSHRNBQ_N_S "vqrshrnb") (VQRSHRNBQ_N_U
> "vqrshrnb")
>(VQRSHRNTQ_M_N_S "vqrshrnt") (VQRSHRNTQ_M_N_U
> "vqrshrnt")
>(VQRSHRNTQ_N_S "vqrshrnt") (VQRSHRNTQ_N_U "vqrshrnt")
> +  (VQRSHRUNBQ_M_N_S "vqrshrunb")
> +  (VQRSHRUNBQ_N_S "vqrshrunb")
> +  (VQRSHRUNTQ_M_N_S "vqrshrunt")
> +  (VQRSHRUNTQ_N_S "vqrshrunt")
>(VQSHLQ_M_N_S "vqshl") (VQSHLQ_M_N_U "vqshl")
>(VQSHLQ_M_R_S "vqshl") (VQSHLQ_M_R_U "vqshl")
>(VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
> @@ -604,6 +616,10 @@ (define_int_attr mve_insn [
>(VQSHRNBQ_N_S "vqshrnb") (VQSHRNBQ_N_U "vqshrnb")
>(VQSHRNTQ_M_N_S "vqshrnt") (VQSHRNTQ_M_N_U
> "vqshrnt")
>(VQSHRNTQ_N_S "vqshrnt") (VQSHRNTQ_N_U "vqshrnt")
> +  (VQSHRUNBQ_M_N_S "vqshrunb")
> +  (VQSHRUNBQ_N_S "vqshrunb")
> +  (VQSHRUNTQ_M_N_S "vqshrunt")
> +  (VQSHRUNTQ_N_S "vqshrunt")
>(VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
>(VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
>(VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
> @@ -640,10 +656,18 @@ (define_int_attr isu[
>(VQRSHRNBQ_N_S "s") (VQRSHRNBQ_N_U "u")
>(VQRSHRNTQ_M_N_S "s") (VQRSHRNTQ_M_N_U "u")
>(VQRSHRNTQ_N_S "s") (VQRSHRNTQ_N_U "u")
> +  (VQRSHRUNBQ_M_N_S "s")
> +  (VQRSHRUNBQ_N_S "s")
> +  (VQRSHRUNTQ_M_N_S "s")
> +  (VQRSHRUNTQ_N_S "s")
>(VQSHRNBQ_M_N_S "s") (VQSHRNBQ_M_N_U "u")
>(VQSHRNBQ_N_S "s") (VQSHRNBQ_N_U "u")
>(VQSHRNTQ_M_N_S "s") (VQSHRNTQ_M_N_U "u")
>(VQSHRNTQ_N_S "s") (VQSHRNTQ_N_U "u")
> +  (VQSHRUNBQ_M_N_S "s")
> +  (VQSHRUNBQ_N_S "s")
> +  (VQSHRUNTQ_M_N_S "s")
> +  (VQSHRUNTQ_N_S "s")
>(VRSHRNBQ_M_N_S "i") (VRSHRNBQ_M_N_U "i")
>(VRSHRNBQ_N_S "i") (VRSHRNBQ_N_U "i")
>(VRSHRNTQ_M_N_S "i") (VRSHRNTQ_M_N_U "i")
> @@ -1816,6 +1840,14 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>  (VQRDMULHQ_M_N_S "s")
>

RE: [PATCH 20/23] arm: [MVE intrinsics] rework vqrshrunbq vqrshruntq vqshrunbq vqshruntq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 20/23] arm: [MVE intrinsics] rework vqrshrunbq vqrshruntq
> vqshrunbq vqshruntq
> 
> Implement vqrshrunbq, vqrshruntq, vqshrunbq, vqshruntq using the new
> MVE builtins framework.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc
> (FUNCTION_ONLY_N_NO_U_F): New.
>   (vqshrunbq, vqshruntq, vqrshrunbq, vqrshruntq): New.
>   * config/arm/arm-mve-builtins-base.def (vqshrunbq, vqshruntq)
>   (vqrshrunbq, vqrshruntq): New.
>   * config/arm/arm-mve-builtins-base.h (vqshrunbq, vqshruntq)
>   (vqrshrunbq, vqrshruntq): New.
>   * config/arm/arm-mve-builtins.cc
>   (function_instance::has_inactive_argument): Handle vqshrunbq,
>   vqshruntq, vqrshrunbq, vqrshruntq.
>   * config/arm/arm_mve.h (vqrshrunbq): Remove.
>   (vqrshruntq): Remove.
>   (vqrshrunbq_m): Remove.
>   (vqrshruntq_m): Remove.
>   (vqrshrunbq_n_s16): Remove.
>   (vqrshrunbq_n_s32): Remove.
>   (vqrshruntq_n_s16): Remove.
>   (vqrshruntq_n_s32): Remove.
>   (vqrshrunbq_m_n_s32): Remove.
>   (vqrshrunbq_m_n_s16): Remove.
>   (vqrshruntq_m_n_s32): Remove.
>   (vqrshruntq_m_n_s16): Remove.
>   (__arm_vqrshrunbq_n_s16): Remove.
>   (__arm_vqrshrunbq_n_s32): Remove.
>   (__arm_vqrshruntq_n_s16): Remove.
>   (__arm_vqrshruntq_n_s32): Remove.
>   (__arm_vqrshrunbq_m_n_s32): Remove.
>   (__arm_vqrshrunbq_m_n_s16): Remove.
>   (__arm_vqrshruntq_m_n_s32): Remove.
>   (__arm_vqrshruntq_m_n_s16): Remove.
>   (__arm_vqrshrunbq): Remove.
>   (__arm_vqrshruntq): Remove.
>   (__arm_vqrshrunbq_m): Remove.
>   (__arm_vqrshruntq_m): Remove.
>   (vqshrunbq): Remove.
>   (vqshruntq): Remove.
>   (vqshrunbq_m): Remove.
>   (vqshruntq_m): Remove.
>   (vqshrunbq_n_s16): Remove.
>   (vqshruntq_n_s16): Remove.
>   (vqshrunbq_n_s32): Remove.
>   (vqshruntq_n_s32): Remove.
>   (vqshrunbq_m_n_s32): Remove.
>   (vqshrunbq_m_n_s16): Remove.
>   (vqshruntq_m_n_s32): Remove.
>   (vqshruntq_m_n_s16): Remove.
>   (__arm_vqshrunbq_n_s16): Remove.
>   (__arm_vqshruntq_n_s16): Remove.
>   (__arm_vqshrunbq_n_s32): Remove.
>   (__arm_vqshruntq_n_s32): Remove.
>   (__arm_vqshrunbq_m_n_s32): Remove.
>   (__arm_vqshrunbq_m_n_s16): Remove.
>   (__arm_vqshruntq_m_n_s32): Remove.
>   (__arm_vqshruntq_m_n_s16): Remove.
>   (__arm_vqshrunbq): Remove.
>   (__arm_vqshruntq): Remove.
>   (__arm_vqshrunbq_m): Remove.
>   (__arm_vqshruntq_m): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |  13 +
>  gcc/config/arm/arm-mve-builtins-base.def |   4 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   4 +
>  gcc/config/arm/arm-mve-builtins.cc   |   4 +
>  gcc/config/arm/arm_mve.h | 320 ---
>  5 files changed, 25 insertions(+), 320 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index c95abe70239..e7d2e0abffc 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -184,6 +184,15 @@ namespace arm_mve {
>  -1, -1, -1,  
> \
>  UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
> 
> +  /* Helper for builtins with only unspec codes, _m predicated
> + overrides, only _n version, no unsigned, no floating-point.  */
> +#define FUNCTION_ONLY_N_NO_U_F(NAME, UNSPEC) FUNCTION
>   \
> +  (NAME, unspec_mve_function_exact_insn, \
> +   (-1, -1, -1,  
> \
> +UNSPEC##_N_S, -1, -1,\
> +-1, -1, -1,  
> \
> +UNSPEC##_M_N_S, -1, -1))
> +
>  FUNCTION_WITHOUT_N (vabdq, VABDQ)
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
>  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
> @@ -203,8 +212,12 @@ FUNCTION_WITH_M_N_NO_U_F (vqrdmulhq,
> VQRDMULHQ)
>  FUNCTION_WITH_M_N_R (vqshlq, VQSHLQ)
>  FUNCTION_ONLY_N_NO_F (vqrshrnbq, VQRSHRNBQ)
>  FUNCTION_ONLY_N_NO_F (vqrshrntq, VQRSHRNTQ)
> +FUNCTION_ONLY_N_NO_U_F (vqrshrunbq, VQRSHRUNBQ)
> +FUNCTION_ONLY_N_NO_U_F (vqrshruntq, VQRSHRUNTQ)
>  FUNCTION_ONLY_N_NO_F (vqshrnbq, VQSHRNBQ)
>  FUNCTION_ONLY_N_NO_F (vqshrntq, VQSHRNTQ)
> +FUNCTION_ONLY_N_NO_U_F (vqshrunbq, VQSHRUNBQ)
> +FUNCTION_ONLY_N_NO_U_F (vqshruntq, VQSHRUNTQ)
>  FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
>  FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> i

RE: [PATCH 21/23] arm: [MVE intrinsics] add binary_rshift shape

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 21/23] arm: [MVE intrinsics] add binary_rshift shape
> 
> This patch adds the binary_rshift shape description.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_rshift): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_rshift): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 36 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 37 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index e3bf586565c..7078f7d7220 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -365,6 +365,42 @@ struct binary_def : public overloaded_base<0>
>  };
>  SHAPE (binary)
> 
> +/* _t vfoo[_n_t0](_t, const int)
> +
> +   Shape for vector shift right operations that take a vector first
> +   argument and an integer, and produce a vector.
> +
> +   Check that 'imm' is in the [1..#bits] range.
> +
> +   Example: vrshrq.
> +   int8x16_t [__arm_]vrshrq[_n_s8](int8x16_t a, const int imm)
> +   int8x16_t [__arm_]vrshrq_m[_n_s8](int8x16_t inactive, int8x16_t a, const
> int imm, mve_pred16_t p)
> +   int8x16_t [__arm_]vrshrq_x[_n_s8](int8x16_t a, const int imm,
> mve_pred16_t p)  */
> +struct binary_rshift_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +build_all (b, "v0,v0,ss32", group, MODE_n, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +return r.resolve_uniform (1, 1);
> +  }
> +
> +  bool
> +  check (function_checker &c) const override
> +  {
> +unsigned int bits = c.type_suffix (0).element_bits;
> +return c.require_immediate_range (1, 1, bits);
> +  }
> +};
> +SHAPE (binary_rshift)
> +
>  /* _t vfoo[_t0](_t, _t)
> _t vfoo[_n_t0](_t, _t)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index ca1c1017e8e..09e00b69e63 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -40,6 +40,7 @@ namespace arm_mve
>  extern const function_shape *const binary_opt_n;
>  extern const function_shape *const binary_orrq;
>  extern const function_shape *const binary_round_lshift;
> +extern const function_shape *const binary_rshift;
>  extern const function_shape *const binary_rshift_narrow;
>  extern const function_shape *const binary_rshift_narrow_unsigned;
>  extern const function_shape *const create;
> --
> 2.34.1

RE: [PATCH 22/23] arm: [MVE intrinsics] factorize vsrhrq vrshrq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:39 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 22/23] arm: [MVE intrinsics] factorize vsrhrq vrshrq
> 
> Factorize vsrhrq vrshrq so that they use the same pattern.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (MVE_VSHRQ_M_N, MVE_VSHRQ_N): New.
>   (mve_insn): Add vrshr, vshr.
>   * config/arm/mve.md (mve_vshrq_n_)
>   (mve_vrshrq_n_): Merge into ...
>   (@mve_q_n_): ... this.
>   (mve_vrshrq_m_n_, mve_vshrq_m_n_):
> Merge
>   into ...
>   (@mve_q_m_n_): ... this.
> ---
>  gcc/config/arm/iterators.md | 14 +++
>  gcc/config/arm/mve.md   | 46 +++--
>  2 files changed, 22 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 583206dac9e..53873704174 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -408,6 +408,16 @@ (define_int_iterator MVE_INT_N_BINARY   [
>VSUBQ_N_S VSUBQ_N_U
>])
> 
> +(define_int_iterator MVE_VSHRQ_M_N [
> +  VRSHRQ_M_N_S VRSHRQ_M_N_U
> +  VSHRQ_M_N_S VSHRQ_M_N_U
> +  ])
> +
> +(define_int_iterator MVE_VSHRQ_N [
> +  VRSHRQ_N_S VRSHRQ_N_U
> +  VSHRQ_N_S VSHRQ_N_U
> +  ])
> +
>  (define_int_iterator MVE_INT_SU_N_BINARY   [
>VHADDQ_N_S VHADDQ_N_U
>VHSUBQ_N_S VHSUBQ_N_U
> @@ -636,6 +646,8 @@ (define_int_attr mve_insn [
>(VRSHRNBQ_N_S "vrshrnb") (VRSHRNBQ_N_U "vrshrnb")
>(VRSHRNTQ_M_N_S "vrshrnt") (VRSHRNTQ_M_N_U
> "vrshrnt")
>(VRSHRNTQ_N_S "vrshrnt") (VRSHRNTQ_N_U "vrshrnt")
> +  (VRSHRQ_M_N_S "vrshr") (VRSHRQ_M_N_U "vrshr")
> +  (VRSHRQ_N_S "vrshr") (VRSHRQ_N_U "vrshr")
>(VSHLQ_M_N_S "vshl") (VSHLQ_M_N_U "vshl")
>(VSHLQ_M_R_S "vshl") (VSHLQ_M_R_U "vshl")
>(VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
> @@ -646,6 +658,8 @@ (define_int_attr mve_insn [
>(VSHRNBQ_N_S "vshrnb") (VSHRNBQ_N_U "vshrnb")
>(VSHRNTQ_M_N_S "vshrnt") (VSHRNTQ_M_N_U "vshrnt")
>(VSHRNTQ_N_S "vshrnt") (VSHRNTQ_N_U "vshrnt")
> +  (VSHRQ_M_N_S "vshr") (VSHRQ_M_N_U "vshr")
> +  (VSHRQ_N_S "vshr") (VSHRQ_N_U "vshr")
>(VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>(VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
>(VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 20ce7ecb3d6..b5c89fd4105 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -728,18 +728,19 @@ (define_insn
> "@mve_q_"
> (set_attr "length""8")])
> 
>  ;;
> -;; [vshrq_n_s, vshrq_n_u])
> +;; [vrshrq_n_s, vrshrq_n_u]
> +;; [vshrq_n_s, vshrq_n_u]
>  ;;
>  ;; Version that takes an immediate as operand 2.
> -(define_insn "mve_vshrq_n_"
> +(define_insn "@mve_q_n_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  (match_operand:SI 2 ""
> "")]
> -  VSHRQ_N))
> +  MVE_VSHRQ_N))
>]
>"TARGET_HAVE_MVE"
> -  "vshr.\t%q0, %q1, %2"
> +  ".\t%q0, %q1, %2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -1401,21 +1402,6 @@ (define_insn "mve_vqshluq_n_s"
>[(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vrshrq_n_s, vrshrq_n_u])
> -;;
> -(define_insn "mve_vrshrq_n_"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -(match_operand:SI 2 ""
> "")]
> -  VRSHRQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vrshr.%#\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vabdq_f]
>  ;;
> @@ -4661,35 +4647,19 @@ (define_insn
> "@mve_q_m_n_"
> 
>  ;;
>  ;; [vrshrq_m_n_s, vrshrq_m_n_u])
> -;;
> -(define_insn "mve_vrshrq_m_n_"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -(match_operand:MVE_2 2 "s_register_operand" "w")
> -(match_operand:SI 3 ""
> "")
> -(match_operand: 4
> "vpr_register_operand" "Up")]
> -  VRSHRQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vrshrt.%#\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
>  ;; [vshrq_m_n_s, vshrq_m_n_u])
>  ;;
> -(define_insn "mve_vshrq_m_n_"
> +(define_insn "@mve_q_m_n_"
>[
> (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>   (unspec:MVE_

RE: [PATCH 23/23] arm: [MVE intrinsics] rework vshrq vrshrq

2023-05-05 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Christophe Lyon 
> Sent: Friday, May 5, 2023 9:40 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 23/23] arm: [MVE intrinsics] rework vshrq vrshrq
> 
> Implement vshrq and vrshrq using the new MVE builtins framework.

Ok.
Looking forward to more of the transition!
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (vrshrq, vshrq): New.
>   * config/arm/arm-mve-builtins-base.def (vrshrq, vshrq): New.
>   * config/arm/arm-mve-builtins-base.h (vrshrq, vshrq): New.
>   * config/arm/arm_mve.h (vshrq): Remove.
>   (vrshrq): Remove.
>   (vrshrq_m): Remove.
>   (vshrq_m): Remove.
>   (vrshrq_x): Remove.
>   (vshrq_x): Remove.
>   (vshrq_n_s8): Remove.
>   (vshrq_n_s16): Remove.
>   (vshrq_n_s32): Remove.
>   (vshrq_n_u8): Remove.
>   (vshrq_n_u16): Remove.
>   (vshrq_n_u32): Remove.
>   (vrshrq_n_u8): Remove.
>   (vrshrq_n_s8): Remove.
>   (vrshrq_n_u16): Remove.
>   (vrshrq_n_s16): Remove.
>   (vrshrq_n_u32): Remove.
>   (vrshrq_n_s32): Remove.
>   (vrshrq_m_n_s8): Remove.
>   (vrshrq_m_n_s32): Remove.
>   (vrshrq_m_n_s16): Remove.
>   (vrshrq_m_n_u8): Remove.
>   (vrshrq_m_n_u32): Remove.
>   (vrshrq_m_n_u16): Remove.
>   (vshrq_m_n_s8): Remove.
>   (vshrq_m_n_s32): Remove.
>   (vshrq_m_n_s16): Remove.
>   (vshrq_m_n_u8): Remove.
>   (vshrq_m_n_u32): Remove.
>   (vshrq_m_n_u16): Remove.
>   (vrshrq_x_n_s8): Remove.
>   (vrshrq_x_n_s16): Remove.
>   (vrshrq_x_n_s32): Remove.
>   (vrshrq_x_n_u8): Remove.
>   (vrshrq_x_n_u16): Remove.
>   (vrshrq_x_n_u32): Remove.
>   (vshrq_x_n_s8): Remove.
>   (vshrq_x_n_s16): Remove.
>   (vshrq_x_n_s32): Remove.
>   (vshrq_x_n_u8): Remove.
>   (vshrq_x_n_u16): Remove.
>   (vshrq_x_n_u32): Remove.
>   (__arm_vshrq_n_s8): Remove.
>   (__arm_vshrq_n_s16): Remove.
>   (__arm_vshrq_n_s32): Remove.
>   (__arm_vshrq_n_u8): Remove.
>   (__arm_vshrq_n_u16): Remove.
>   (__arm_vshrq_n_u32): Remove.
>   (__arm_vrshrq_n_u8): Remove.
>   (__arm_vrshrq_n_s8): Remove.
>   (__arm_vrshrq_n_u16): Remove.
>   (__arm_vrshrq_n_s16): Remove.
>   (__arm_vrshrq_n_u32): Remove.
>   (__arm_vrshrq_n_s32): Remove.
>   (__arm_vrshrq_m_n_s8): Remove.
>   (__arm_vrshrq_m_n_s32): Remove.
>   (__arm_vrshrq_m_n_s16): Remove.
>   (__arm_vrshrq_m_n_u8): Remove.
>   (__arm_vrshrq_m_n_u32): Remove.
>   (__arm_vrshrq_m_n_u16): Remove.
>   (__arm_vshrq_m_n_s8): Remove.
>   (__arm_vshrq_m_n_s32): Remove.
>   (__arm_vshrq_m_n_s16): Remove.
>   (__arm_vshrq_m_n_u8): Remove.
>   (__arm_vshrq_m_n_u32): Remove.
>   (__arm_vshrq_m_n_u16): Remove.
>   (__arm_vrshrq_x_n_s8): Remove.
>   (__arm_vrshrq_x_n_s16): Remove.
>   (__arm_vrshrq_x_n_s32): Remove.
>   (__arm_vrshrq_x_n_u8): Remove.
>   (__arm_vrshrq_x_n_u16): Remove.
>   (__arm_vrshrq_x_n_u32): Remove.
>   (__arm_vshrq_x_n_s8): Remove.
>   (__arm_vshrq_x_n_s16): Remove.
>   (__arm_vshrq_x_n_s32): Remove.
>   (__arm_vshrq_x_n_u8): Remove.
>   (__arm_vshrq_x_n_u16): Remove.
>   (__arm_vshrq_x_n_u32): Remove.
>   (__arm_vshrq): Remove.
>   (__arm_vrshrq): Remove.
>   (__arm_vrshrq_m): Remove.
>   (__arm_vshrq_m): Remove.
>   (__arm_vrshrq_x): Remove.
>   (__arm_vshrq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
>  gcc/config/arm/arm-mve-builtins-base.def |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   2 +
>  gcc/config/arm/arm_mve.h | 628 ---
>  4 files changed, 6 insertions(+), 628 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index e7d2e0abffc..bb585a3921f 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -225,9 +225,11 @@ FUNCTION_WITHOUT_N_NO_F (vrmulhq,
> VRMULHQ)
>  FUNCTION_WITH_M_N_NO_F (vrshlq, VRSHLQ)
>  FUNCTION_ONLY_N_NO_F (vrshrnbq, VRSHRNBQ)
>  FUNCTION_ONLY_N_NO_F (vrshrntq, VRSHRNTQ)
> +FUNCTION_ONLY_N_NO_F (vrshrq, VRSHRQ)
>  FUNCTION_WITH_M_N_R (vshlq, VSHLQ)
>  FUNCTION_ONLY_N_NO_F (vshrnbq, VSHRNBQ)
>  FUNCTION_ONLY_N_NO_F (vshrntq, VSHRNTQ)
> +FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ)
>  FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
>  FUNCTION (vuninitializedq, vuninitializedq_impl,)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 50cb2d055e9..33c95c02396 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -51,10 +51,12 @@ DEF_MVE_FUNCTION (vrmulhq, binary, all_integer,
> mx_or_none)
>  DEF_MVE_FUNCTION (vrshlq, binary_round_lshif

[Patch] GCN: Silence unused-variable warning

2023-05-05 Thread Tobias Burnus


Probably added for symmetry with out_mode/out_n but at the end not used.
That function was added in commit
  r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math functions

Tested the removal by building with that patch applied.
OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
GCN: Silence unused-variable warning

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_vectorize_builtin_vectorized_function): Remove
	unused in_mode/in_n variables.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 5608d85a1a0..7bb71392c4c 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5361,8 +5361,6 @@ gcn_vectorize_builtin_vectorized_function (unsigned int fn, tree type_out,
 
   machine_mode out_mode = TYPE_MODE (TREE_TYPE (type_out));
   int out_n = TYPE_VECTOR_SUBPARTS (type_out);
-  machine_mode in_mode = TYPE_MODE (TREE_TYPE (type_in));
-  int in_n = TYPE_VECTOR_SUBPARTS (type_in);
   combined_fn cfn = combined_fn (fn);
 
   /* Keep this consistent with the list of vectorized math routines.  */

Re: [Patch] GCN: Silence unused-variable warning

2023-05-05 Thread Andrew Stubbs


On 05/05/2023 12:10, Tobias Burnus wrote:

Probably added for symmetry with out_mode/out_n but at the end not used.
That function was added in commit
   r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math 
functions


Tested the removal by building with that patch applied.
OK for mainline?


OK.

Andrew

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jason Merrill via Gcc-patches


On 5/5/23 06:45, Jakub Jelinek wrote:

On Fri, May 05, 2023 at 11:55:41AM +0200, Jakub Jelinek via Gcc-patches wrote:

Looking at the Ada cases (I admit I don't really understand why it isn't
vectorized, the IL is so different from the start because of the extra
SAVE_EXPRs that it is very hard to diff stuff), the case where save_expr
used to return the argument and no longer does are those
r.P_BOUNDS->LB0
etc. cases.  Now, I wondered if (pre-gimplification) we couldn't make an
exception and allow the base to be INDIRECT_REF or of a REFERENCE_TYPE
with the idea that references are really imutable and can't be changed
during its lifetime (after gimplification whether something is
REFERENCE_TYPE or POINTER_TYPE is lost), but that isn't what Ada is using.


And anyway, a reference can also refer to a non-const object.


So, another possibility would be to allow bases of TREE_READONLY (t) &&
!TREE_SIDE_EFFECTS (t) which are INDIRECT_REFs of tree_invariant_p_1
addresses.  That doesn't work either, in the r.P_BOUNDS->LB0 case
P_BOUNDS is a FIELD_DECL with POINTER_TYPE, LB0 is TREE_READONLY FIELD_DECL
and that COMPONENT_REF is  also TREE_READONLY, r is TREE_READONLY PARM_DECL,
but unforuntately the r.P_BOUNDS COMPONENT_REF isn't marked TREE_READONLY.


And an invariant pointer can point to a non-const object.


Thus, shall we treat as tree_invariant_p_1 also handled components which
are !TREE_SIDE_EFFECTS (t), but not TREE_READONLY and only their base
is TREE_READONLY?  Or do that only during the recursion?

But doing that feels quite risky.  While the following version of
the patch avoids the Ada regressions, the fact that we don't miscompile
the pr52339-1.c testcase modified to have
int
foo (const struct S *const p, struct S *q)
rather than
int
foo (const struct S *p, struct S *q)
is only because the FE happens to add there some useless cast in between.
While the pointer is invariant, I'm afraid nothing guarantees it goes out
of scope in between multiple uses of the expression returned by save_expr.


Right.


2023-05-05  Jakub Jelinek  

PR c++/52339
* tree.cc (tree_invariant_p_1): For TREE_READONLY (t) without
side-effects, only return true if DECL_P (get_base_address (t)).

* g++.dg/opt/pr52339.C: New test.
* gcc.c-torture/execute/pr52339-1.c: New test.
* gcc.c-torture/execute/pr52339-2.c: New test.

--- gcc/tree.cc.jj  2023-05-01 09:59:46.686293833 +0200
+++ gcc/tree.cc 2023-05-05 12:34:26.989523468 +0200
@@ -3876,10 +3876,26 @@ tree_invariant_p_1 (tree t)
  {
tree op;
  
-  if (TREE_CONSTANT (t)

-  || (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t)))
+  if (TREE_CONSTANT (t))
  return true;
  
+  if (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))

+{
+  /* Return true for const qualified vars, but for members or array
+elements without side-effects return true only if the base
+object is a decl.  If the base is e.g. a pointer dereference,
+what the pointer points to could be deallocated or the pointer
+could be changed.  See PR52339.  */
+  tree base = get_base_address (t);
+  if (DECL_P (base))
+   return true;


So I think the above is correct.


+  /* As an exception, allow pointer dereferences as long as the pointer
+is invariant.  */
+  if (TREE_CODE (base) == INDIRECT_REF
+ && tree_invariant_p_1 (get_base_address (TREE_OPERAND (base, 0
+   return true;


And this is unsafe.


+}
+
switch (TREE_CODE (t))
  {
  case SAVE_EXPR:
--- gcc/testsuite/g++.dg/opt/pr52339.C.jj   2023-05-04 15:23:20.459935705 
+0200
+++ gcc/testsuite/g++.dg/opt/pr52339.C  2023-05-04 15:22:35.640578681 +0200
@@ -0,0 +1,19 @@
+// PR c++/52339
+// { dg-do run { target c++11 } }
+
+
+struct B;
+struct A { B *b; };
+struct B {
+  A *a;
+  B () : a(new A{this}) {}
+  ~B () { delete a; }
+};
+
+int
+main ()
+{
+  B *b = new B;
+  const A *a = b->a;
+  delete a->b;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr52339-1.c.jj  2023-05-04 
15:22:59.177241023 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr52339-1.c 2023-05-04 
15:20:19.820527142 +0200
@@ -0,0 +1,29 @@
+/* PR c++/52339 */
+
+struct S { int a; };
+
+void
+bar (int *p, struct S *q)
+{
+  __builtin_free (q);
+}
+
+int
+foo (const struct S *p, struct S *q)
+{
+  int b[p->a];
+  bar (b, q);
+  return sizeof (b);
+}
+
+int
+main ()
+{
+  struct S *p = __builtin_malloc (sizeof (struct S));
+  if (!p)
+return 0;
+  p->a = 42;
+  if (foo (p, p) != 42 * sizeof (int))
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr52339-2.c.jj  2022-11-21 
10:04:00.210677046 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr52339-2.c 2023-05-04 
19:34:08.581686806 +0200
@@ -0,0 +1,20 @@
+/* PR c++/52339 */
+
+struct S { int a; };
+
+int
+foo (const struct S *p)
+{
+  int b[p->a];
+  ++p;
+  return sizeof (b);
+}
+
+int
+main ()
+{
+  struct S s[] = { { 42 }, { 43 } };
+  if (foo (s) != 42 * sizeof

Re: [PATCH] gimple-range-op: Improve handling of sqrt ranges

2023-05-05 Thread Mikael Morin


Hello,

Le 05/05/2023 à 10:00, Jakub Jelinek via Gcc-patches a écrit :

Hi!

The previous patch just added basic intrinsic ranges for sqrt
([-0.0, +Inf] +-NAN being the general result range of the function
and [-0.0, +Inf] the general operand range if result isn't NAN etc.),
the following patch intersects those ranges with particular range
computed from argument or result's exact range with the expected
error in ulps taken into account and adds a function (frange_arithmetic
variant) which can be used by other functions as well as helper.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I think there is something wrong with the handling of NAN and lower 
bounds in op1_range.  If the lhs has positive lower bound and can be 
nan, op1 should have -inf lower bound regardless of lhs's lower bound value.


Maybe it would be less error prone to combine the ordered range part 
(r2) and the unordered one (r) with union instead of intersection?


The rest looks good.  There is the case of slightly negative lhs bounds 
coming (through ulps) from slightly positive op1 that could get better 
than infinite bounds for op1 range, but at least it is conservatively 
correct, and I'm not sure it's that important anyway.


Mikael

Re: [PATCH] Fortran: overloading of intrinsic binary operators [PR109641]

2023-05-05 Thread Mikael Morin


Hello,

Le 01/05/2023 à 18:29, Harald Anlauf via Fortran a écrit :

Dear all,

the attached patch is mostly self-explaining: we mishandled the
overloading of intrinsic binary operators in the case the actual
operands were of intrinsic numeric type and the ranks of the
operands were not conformable, i.e. both were of non-zero and
different ranks.  In that case the operators could be converted
to the same type before the correct user-defined operator was
resolved, leading to either rejects-valid or accepts-invalid
or wrong resolution (= wrong code).

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

The patch is actually very limited in impact, but the bug is
sort of annoying.  Would it be OK to backport to 13.2 after
some waiting?

Thanks,
Harald

(...)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index c3d508fb45d..341909d7de7 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5644,6 +5666,23 @@ done:
 }


+/* Given two expressions, check that their rank is conformable, i.e. either
+   both have the same rank or at least one is a scalar.  */
+
+bool
+gfc_op_rank_conformable (gfc_expr *op1, gfc_expr *op2)
+{
+//  if (op1->expr_type == EXPR_VARIABLE && op1->ref)

Please remove this, and the other one below.


+  if (op1->expr_type == EXPR_VARIABLE)
+gfc_expression_rank (op1);
+//  if (op2->expr_type == EXPR_VARIABLE && op2->ref)
+  if (op2->expr_type == EXPR_VARIABLE)
+gfc_expression_rank (op2);
+
+  return (op1->rank == 0 || op2->rank == 0 || op1->rank == op2->rank);
+}
+
+
 static void
 add_caf_get_intrinsic (gfc_expr *e)
 {


The rest looks good.
OK for master, and backport as well.

Thanks
Mikael

Re: [PATCH] gimple-range-op: Improve handling of sqrt ranges

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 01:38:48PM +0200, Mikael Morin wrote:
> I think there is something wrong with the handling of NAN and lower bounds
> in op1_range.  If the lhs has positive lower bound and can be nan, op1
> should have -inf lower bound regardless of lhs's lower bound value.

Oops, you're right, will retest with
--- gcc/gimple-range-op.cc  2023-05-04 19:58:44.842606865 +0200
+++ gcc/gimple-range-op.cc  2023-05-05 13:53:58.742406160 +0200
@@ -508,6 +508,7 @@
   ub = dconstinf;
 frange r2;
 r2.set (type, lb, ub);
+r2.flush_denormals_to_zero ();
 r.intersect (r2);
 return true;
   }
@@ -563,7 +564,7 @@
   return true;
 REAL_VALUE_TYPE lb = lhs.lower_bound ();
 REAL_VALUE_TYPE ub = lhs.upper_bound ();
-if (real_less (&dconst0, &lb))
+if (!lhs.maybe_isnan () && real_less (&dconst0, &lb))
   {
for (unsigned i = 0; i < ulps; ++i)
  frange_nextafter (TYPE_MODE (type), lb, dconstninf);
incremental change (the first hunk because fold_range is forward operation
and that could in some configurations flush denormals to zero).
Thanks for catching that.

> Maybe it would be less error prone to combine the ordered range part (r2)
> and the unordered one (r) with union instead of intersection?

The reason for the intersection rather than tweaking the bounds immediately
is for 2 reasons, ulps might be larger than bulps (the boundary ones) but
the boundary ulps should take priority on the boundaries, so if say bulps is
2ulps and ulps 10ulps and sqrt (lb) is 0.0 + 4ulps, then the range should be
[-0.0 - 2ulps, ...] rather than [-0.0 - 6ulps, ...], and to help readers
determine this is the range from the intrinsic boundaries, this is the range
from specific values, one can put breakpoints around the intersection and
print both ranges and the result etc.

> The rest looks good.  There is the case of slightly negative lhs bounds
> coming (through ulps) from slightly positive op1 that could get better than
> infinite bounds for op1 range, but at least it is conservatively correct,
> and I'm not sure it's that important anyway.

Currently bulps for sqrt in glibc is always 0, haven't discovered any case of
sqrt actually returning < -0.0 value, so the sqrt bulps other than ~0U
(don't know, currently all non-glibc targets) is always 0.  That is just in
case some other implementation returns such values.  For sin/cos
unfortunately even glibc sometimes returns > 1.0 or < -1.0 values in
non-default rounding modes.

Jakub

[PATCH] i386: Introduce mulv2si3 instruction

2023-05-05 Thread Uros Bizjak via Gcc-patches

For SSE2 targets the expander unpacks input elements into the correct
position in the V4SI vector and emits PMULUDQ instruction.  The output
elements are then shuffled back to their positions in the V2SI vector.

For SSE4 targets PMULLD instruction is emitted directly.

gcc/ChangeLog:

* config/i386/mmx.md (mulv2si3): New expander.
(*mulv2si3): New insn pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sse2-mmx-mult-vec.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 872ddbc55f2..6dd203f4fa8 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2092,6 +2092,55 @@ (define_insn "*3"
(set_attr "type" "sseadd")
(set_attr "mode" "TI")])
 
+(define_expand "mulv2si3"
+  [(set (match_operand:V2SI 0 "register_operand")
+   (mult:V2SI
+ (match_operand:V2SI 1 "register_operand")
+ (match_operand:V2SI 2 "register_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  if (!TARGET_SSE4_1)
+{
+  rtx op1 = lowpart_subreg (V4SImode, force_reg (V2SImode, operands[1]),
+   V2SImode);
+  rtx op2 = lowpart_subreg (V4SImode, force_reg (V2SImode, operands[2]),
+   V2SImode);
+
+  rtx tmp1 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_vec_interleave_lowv4si (tmp1, op1, op1));
+  rtx tmp2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_vec_interleave_lowv4si (tmp2, op2, op2));
+
+  rtx res = gen_reg_rtx (V2DImode);
+  emit_insn (gen_vec_widen_umult_even_v4si (res, tmp1, tmp2));
+
+  rtx op0 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_sse2_pshufd_1 (op0, gen_lowpart (V4SImode, res),
+   const0_rtx, const2_rtx,
+   const0_rtx, const2_rtx));
+
+  emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0, V4SImode));
+  DONE;
+}
+})
+
+(define_insn "*mulv2si3"
+  [(set (match_operand:V2SI 0 "register_operand" "=Yr,*x,v")
+   (mult:V2SI
+ (match_operand:V2SI 1 "register_operand" "%0,0,v")
+ (match_operand:V2SI 2 "register_operand" "Yr,*x,v")))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "@
+   pmulld\t{%2, %0|%0, %2}
+   pmulld\t{%2, %0|%0, %2}
+   vpmulld\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "sseimul")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "btver2_decode" "vector")
+   (set_attr "mode" "TI")])
+
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
 (mult:V4HI (match_operand:V4HI 1 "register_mmxmem_operand")
diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-mult-vec.c 
b/gcc/testsuite/gcc.target/i386/sse2-mmx-mult-vec.c
new file mode 100644
index 000..cdc9a7bb8bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-mult-vec.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+#include "sse2-check.h"
+
+#define N 2
+
+int a[N] = {-287807, 604344};
+int b[N] = {474362, 874120};
+int r[N];
+
+int rc[N] = {914249338, -11800128};
+
+static void
+sse2_test (void)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+r[i] = a[i] * b[i];
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+if (r[i] != rc[i])
+  abort ();
+}

[PATCH] tree-optimization/109735 - conversion for vectorized pointer-diff

2023-05-05 Thread Richard Biener via Gcc-patches

There's handling in vectorizable_operation for POINTER_DIFF_EXPR
requiring conversion of the result of the unsigned operation to
a signed type.  But that's conditional on the "default" kind of
vectorization.  In this PR it's shown the emulated vector path
needs it and I think the masked operation case will, too (though
we might eventually never mask an integral MINUS_EXPR).  So the
following makes that handling unconditional.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109735
* tree-vect-stmts.cc (vectorizable_operation): Perform
conversion for POINTER_DIFF_EXPR unconditionally.
---
 gcc/tree-vect-stmts.cc | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cf5194ea444..61a2da4ecee 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6657,8 +6657,8 @@ vectorizable_operation (vec_info *vinfo,
  new_stmt = gimple_build_assign (NULL_TREE, VIEW_CONVERT_EXPR,
  build1 (VIEW_CONVERT_EXPR,
  vectype, result_low));
- result_low = make_ssa_name (vectype);
- gimple_assign_set_lhs (new_stmt, result_low);
+ new_temp = make_ssa_name (vectype);
+ gimple_assign_set_lhs (new_stmt, new_temp);
  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
}
   else if (masked_loop_p && mask_out_inactive)
@@ -6734,18 +6734,19 @@ vectorizable_operation (vec_info *vinfo,
 AND it with a loop mask again.  */
  if (mask)
loop_vinfo->vec_cond_masked_set.add ({ new_temp, mask });
+   }
 
- if (vec_cvt_dest)
-   {
- new_temp = build1 (VIEW_CONVERT_EXPR, vectype_out, new_temp);
- new_stmt = gimple_build_assign (vec_cvt_dest, VIEW_CONVERT_EXPR,
- new_temp);
- new_temp = make_ssa_name (vec_cvt_dest, new_stmt);
- gimple_assign_set_lhs (new_stmt, new_temp);
- vect_finish_stmt_generation (vinfo, stmt_info,
-  new_stmt, gsi);
-   }
+  if (vec_cvt_dest)
+   {
+ new_temp = build1 (VIEW_CONVERT_EXPR, vectype_out, new_temp);
+ new_stmt = gimple_build_assign (vec_cvt_dest, VIEW_CONVERT_EXPR,
+ new_temp);
+ new_temp = make_ssa_name (vec_cvt_dest, new_stmt);
+ gimple_assign_set_lhs (new_stmt, new_temp);
+ vect_finish_stmt_generation (vinfo, stmt_info,
+  new_stmt, gsi);
}
+
   if (slp_node)
SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
   else
-- 
2.35.3

RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Li, Pan2 via Gcc-patches

Hi kito,

Could you please help to share any suggestion about the PATCH? Comparing the V1 
and V2.

Pan


-Original Message-
From: Li, Pan2 
Sent: Wednesday, May 3, 2023 7:18 PM
To: Jeff Law ; Kito Cheng 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; Andrew Waterman 
Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks all for comments, will work with kito to make it happen.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew Waterman 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #(if_then_else:VNx2BI (unspec:VNx2BI [
> #(const_vector:VNx2BI repeat [
> #(const_int 1 [0x1])
> #])  # all-1 mask
> #(reg:DI 143)  # AVL reg, or vector length
> #(const_int 2 [0x2]) # mask policy
> #(const_int 0 [0])   # avl type
> #(reg:SI 66 vl)
> #(reg:SI 67 vtype)
> #] UNSPEC_VPREDICATE)
> #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #(reg/v:VNx2QI 137 [ v1 ]))
> #(unspec:VNx2BI [
> #(reg:SI 0 zero)
> #] UNSPEC_VUNDEF))) # maskoff and tail operand
> # (expr_list:REG_DEAD (reg:DI 143)
> #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #(nil
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB  0 "register_operand")
> (if_then_else:VB
>   (unspec:VB
> [(match_operand:VB 1 "vector_all_trues_mask_operand")
>  (match_operand4 "vector_length_operand")
>  (match_operand5 "const_int_operand")
>  (match_operand6 "const_int_operand")
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (match_operand:VB3 "vector_move_operand")
>   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> (if_then_else:VNx2BI (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])  # all-1 mask
> (reg:DI 143) # AVL reg, or vector length
> (const_int 2 [0x2]) # mask policy
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])# all-1
> (unspec:VNx2BI [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF))) # still vundef
>  (expr_list:REG_DEAD (reg:DI 143)
> (nil)))
Right.  My concern is that when we call relational_result it's going to return 
-1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to 
generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector 
register to -1, it could just copy from the destination of the vmset to the new 
target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Kito Cheng via Gcc-patches

I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the 
> V1 and V2.
>
> Pan
>
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law ; Kito Cheng 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> ; Andrew Waterman 
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -Original Message-
> From: Jeff Law 
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng 
> Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew 
> Waterman 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, so
> > I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > #(const_vector:VNx2BI repeat [
> > #(const_int 1 [0x1])
> > #])  # all-1 mask
> > #(reg:DI 143)  # AVL reg, or vector length
> > #(const_int 2 [0x2]) # mask policy
> > #(const_int 0 [0])   # avl type
> > #(reg:SI 66 vl)
> > #(reg:SI 67 vtype)
> > #] UNSPEC_VPREDICATE)
> > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #(reg/v:VNx2QI 137 [ v1 ]))
> > #(unspec:VNx2BI [
> > #(reg:SI 0 zero)
> > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > # (expr_list:REG_DEAD (reg:DI 143)
> > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #(nil
> >
> > And the split pattern, only did on tail/maskoff element with undefined 
> > value:
> >
> > (define_split
> >   [(set (match_operand:VB  0 "register_operand")
> > (if_then_else:VB
> >   (unspec:VB
> > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >  (match_operand4 "vector_length_operand")
> >  (match_operand5 "const_int_operand")
> >  (match_operand6 "const_int_operand")
> >  (reg:SI VL_REGNUM)
> >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >   (match_operand:VB3 "vector_move_operand")
> >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand (since
> > maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > (if_then_else:VNx2BI (unspec:VNx2BI [
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])  # all-1 mask
> > (reg:DI 143) # AVL reg, or vector length
> > (const_int 2 [0x2]) # mask policy
> > (reg:SI 66 vl)
> > (reg:SI 67 vtype)
> > ] UNSPEC_VPREDICATE)
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])# all-1
> > (unspec:VNx2BI [
> > (reg:SI 0 zero)
> > ] UNSPEC_VUNDEF))) # still vundef
> >  (expr_list:REG_DEAD (reg:DI 143)
> > (nil)))
> Right.  My concern is that when we call relational_result it's going to 
> return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to 
> generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector 
> register to -1, it could just copy from the destination of the vmset to the 
> new target.  But if the vmset didn't set all the bits to 1, then the code is 
> wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff

RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Li, Pan2 via Gcc-patches

Ok, sounds good. Thank you!

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, May 5, 2023 8:37 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the 
> V1 and V2.
>
> Pan
>
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law ; Kito Cheng 
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> ; Andrew Waterman 
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -Original Message-
> From: Jeff Law 
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng 
> Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew 
> Waterman 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, 
> > so I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > #(const_vector:VNx2BI repeat [
> > #(const_int 1 [0x1])
> > #])  # all-1 mask
> > #(reg:DI 143)  # AVL reg, or vector length
> > #(const_int 2 [0x2]) # mask policy
> > #(const_int 0 [0])   # avl type
> > #(reg:SI 66 vl)
> > #(reg:SI 67 vtype)
> > #] UNSPEC_VPREDICATE)
> > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #(reg/v:VNx2QI 137 [ v1 ]))
> > #(unspec:VNx2BI [
> > #(reg:SI 0 zero)
> > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > # (expr_list:REG_DEAD (reg:DI 143)
> > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #(nil
> >
> > And the split pattern, only did on tail/maskoff element with undefined 
> > value:
> >
> > (define_split
> >   [(set (match_operand:VB  0 "register_operand")
> > (if_then_else:VB
> >   (unspec:VB
> > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >  (match_operand4 "vector_length_operand")
> >  (match_operand5 "const_int_operand")
> >  (match_operand6 "const_int_operand")
> >  (reg:SI VL_REGNUM)
> >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >   (match_operand:VB3 "vector_move_operand")
> >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand 
> > (since maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > (if_then_else:VNx2BI (unspec:VNx2BI [
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])  # all-1 mask
> > (reg:DI 143) # AVL reg, or vector length
> > (const_int 2 [0x2]) # mask policy
> > (reg:SI 66 vl)
> > (reg:SI 67 vtype)
> > ] UNSPEC_VPREDICATE)
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])# all-1
> > (unspec:VNx2BI [
> > (reg:SI 0 zero)
> > ] UNSPEC_VUNDEF))) # still vundef
> >  (expr_list:REG_DEAD (reg:DI 143)
> > (nil)))
> Right.  My concern is that when we call relational_result it's going to 
> return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to 
> generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector 
> register to -1, it could just copy from the destination of the vmset to the 
> new target.  But if the vmset didn't set all the bits to 1, then the code is 
> wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff

RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches

>  This looks pretty reasonable to me.  Are there any patches left in
>  this series that need review?  I'm very much looking forward to
>  build time provements related to this patch, particularly for
>  targets that I bootstrap with qemu emulation -- we take multiple
>  hours to build gimple-match and the ability to parallelize those
>  component
> >> builds should be a significant win.
> >>>
> >>> Hi,
> >>>
> >>> No this is the last one, Richi already approved the rest but he
> >>> didn't feel he had enough knowledge about the build system to say if
> >>> this code was portable enough.
> >>
> >> I'm looking forward to this going as well for improved bootstrap
> >> times, thanks for working on this!
> >>
> >>>
> >>> So just waiting on this one and can commit the series.
> >>
> >> Can we treat Jeff's LGTM above as an ok given his global reviewer position?
> >
> > Ah I didn't treat it as such as it wasn't in reply to the "ok for
> > master" part. But perhaps I misunderstood.  In case it wasn't, this is
> > also a PING for the *.in files maintainers.
> My message was a fairly ambiguous.   I just gave it another once over
> and I'll give an explicit OK for the trunk.
> 

Merci!

I'll go to the next bottleneck then.

Thanks!
Tamar

> Jeff

[PATCH] i386: Rename index_register_operand predicate to register_no_SP_operand

2023-05-05 Thread Uros Bizjak via Gcc-patches

Rename index_register_operand predicate to what it really does.

No functional change.

gcc/ChangeLog:

* config/i386/predicates.md (register_no_SP_operand):
Rename from index_register_operand.
(call_register_operand): Update for rename.
* config/i386/i386.md (*lea_general_[1234]): Update for rename.

Bootstrapped on x86_64-linux-gnu.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 63207fc9305..cf90867b801 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7068,7 +7068,7 @@ (define_expand "uaddv4"
 (define_insn_and_split "*lea_general_1"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(plus:SWI12
- (plus:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+ (plus:SWI12 (match_operand:SWI12 1 "register_no_SP_operand" "l")
  (match_operand:SWI12 2 "register_operand" "r"))
  (match_operand:SWI12 3 "immediate_operand" "i")))]
   "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
@@ -7090,7 +7090,7 @@ (define_insn_and_split "*lea_general_1"
 (define_insn_and_split "*lea_general_2"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(plus:SWI12
- (mult:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+ (mult:SWI12 (match_operand:SWI12 1 "register_no_SP_operand" "l")
  (match_operand 2 "const248_operand" "n"))
  (match_operand:SWI12 3 "nonmemory_operand" "ri")))]
   "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
@@ -7111,7 +7111,7 @@ (define_insn_and_split "*lea_general_2"
 (define_insn_and_split "*lea_general_2b"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(plus:SWI12
- (ashift:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+ (ashift:SWI12 (match_operand:SWI12 1 "register_no_SP_operand" "l")
(match_operand 2 "const123_operand" "n"))
  (match_operand:SWI12 3 "nonmemory_operand" "ri")))]
   "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
@@ -7133,7 +7133,7 @@ (define_insn_and_split "*lea_general_3"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(plus:SWI12
  (plus:SWI12
-   (mult:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+   (mult:SWI12 (match_operand:SWI12 1 "register_no_SP_operand" "l")
(match_operand 2 "const248_operand" "n"))
(match_operand:SWI12 3 "register_operand" "r"))
  (match_operand:SWI12 4 "immediate_operand" "i")))]
@@ -7159,7 +7159,7 @@ (define_insn_and_split "*lea_general_3b"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(plus:SWI12
  (plus:SWI12
-   (ashift:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+   (ashift:SWI12 (match_operand:SWI12 1 "register_no_SP_operand" "l")
  (match_operand 2 "const123_operand" "n"))
(match_operand:SWI12 3 "register_operand" "r"))
  (match_operand:SWI12 4 "immediate_operand" "i")))]
@@ -7185,7 +7185,7 @@ (define_insn_and_split "*lea_general_4"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(any_or:SWI12
  (ashift:SWI12
-   (match_operand:SWI12 1 "index_register_operand" "l")
+   (match_operand:SWI12 1 "register_no_SP_operand" "l")
(match_operand 2 "const_0_to_3_operand"))
  (match_operand 3 "const_int_operand")))]
   "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
@@ -7209,7 +7209,7 @@ (define_insn_and_split "*lea_general_4"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(any_or:SWI48
  (ashift:SWI48
-   (match_operand:SWI48 1 "index_register_operand" "l")
+   (match_operand:SWI48 1 "register_no_SP_operand" "l")
(match_operand 2 "const_0_to_3_operand"))
  (match_operand 3 "const_int_operand")))]
   "(unsigned HOST_WIDE_INT) INTVAL (operands[3])
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 362266e1f6c..fb07707dcba 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -711,7 +711,7 @@ (define_predicate "register_no_elim_operand"
 
 ;; Similarly, but include the stack pointer.  This is used
 ;; to prevent esp from being used as an index reg.
-(define_predicate "index_register_operand"
+(define_predicate "register_no_SP_operand"
   (match_operand 0 "register_operand")
 {
   if (SUBREG_P (op))
@@ -735,7 +735,7 @@ (define_predicate "index_register_operand"
 (define_predicate "call_register_operand"
   (if_then_else (match_test "TARGET_64BIT")
 (match_operand 0 "register_operand")
-(match_operand 0 "index_register_operand")))
+(match_operand 0 "register_no_SP_operand")))
 
 ;; Return false if this is any eliminable register.  Otherwise general_operand.
 (define_predicate "general_no_elim_operand"

Re: [PATCH V2] RISC-V: Fix PR109615

2023-05-05 Thread Kito Cheng via Gcc-patches

LGTM, committed to trunk with few changelog adjustments and few extra comments.


On Fri, May 5, 2023 at 2:33 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is to fix following case:
> void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond)
> {
>   size_t vl = 101;
>   if (cond)
> vl = m * 2;
>   else
> vl = m * 2 * vl;
>
>   for (size_t i = 0; i < n; i++)
> {
>   vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl);
>   __riscv_vse8_v_i8mf8 (out + i, v, vl);
>
>   vbool64_t mask = __riscv_vlm_v_b64 (in + i + 100, vl);
>
>   vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in + i + 100, vl);
>   __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl);
> }
>
>   for (size_t i = 0; i < n; i++)
> {
>   vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl);
>   __riscv_vse8_v_i8mf8 (out + i + 300, v, vl);
> }
> }
>
> The value of "vl" is coming from different blocks so it will be wrapped as a 
> PHI node of each
> block.
>
> In the first loop, the "vl" source is a PHI node from bb 4.
> In the second loop, the "vl" source is a PHI node from bb 5.
> since bb 5 is dominated by bb 4, the PHI input of "vl" in the second loop is 
> the PHI node of "vl"
> in bb 4.
> So when 2 "vl" PHI node are both degenerate PHI node (the phi->num_inputs () 
> == 1) and their only
> input are same, it's safe for us to consider they are compatible.
>
> This patch is only optimize degenerate PHI since it's safe and simple 
> optimization.
>
> non-dengerate PHI are considered as incompatible unless the PHI are the same 
> in RTL_SSA.
> TODO: non-generate PHI is complicated, we can support it when it is necessary 
> in the future.
>
> Before this patch:
>
> ...
> .L2:
> addia4,a1,100
> add t1,a0,a2
> mv  t0,a0
> beq a2,zero,.L1
> vsetvli zero,a3,e8,mf8,tu,mu
> .L4:
> addia6,t0,100
> addia7,a4,-100
> vle8.v  v1,0(t0)
> addit0,t0,1
> vse8.v  v1,0(a7)
> vlm.v   v0,0(a6)
> vle8.v  v1,0(a6),v0.t
> vse8.v  v1,0(a4)
> addia4,a4,1
> bne t0,t1,.L4
> addia0,a0,300
> addia1,a1,300
> add a2,a0,a2
> vsetvli zero,a3,e8,mf8,ta,ma
> .L5:
> vle8.v  v2,0(a0)
> addia0,a0,1
> vse8.v  v2,0(a1)
> addia1,a1,1
> bne a2,a0,.L5
> .L1:
> ret
>
> After this patch:
>
> ...
> .L2:
> addia4,a1,100
> add t1,a0,a2
> mv  t0,a0
> beq a2,zero,.L1
> vsetvli zero,a3,e8,mf8,tu,mu
> .L4:
> addia6,t0,100
> addia7,a4,-100
> vle8.v  v1,0(t0)
> addit0,t0,1
> vse8.v  v1,0(a7)
> vlm.v   v0,0(a6)
> vle8.v  v1,0(a6),v0.t
> vse8.v  v1,0(a4)
> addia4,a4,1
> bne t0,t1,.L4
> addia0,a0,300
> addia1,a1,300
> add a2,a0,a2
> .L5:
> vle8.v  v2,0(a0)
> addia0,a0,1
> vse8.v  v2,0(a1)
> addia1,a1,1
> bne a2,a0,.L5
> .L1:
> ret
>
> PR target/109615
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (avl_info::multiple_source_equal_p): 
> Add denegrate PHI optmization.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/avl_single-74.c: Adapt testcase.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/pr109615.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  | 81 +--
>  .../riscv/rvv/vsetvl/avl_single-74.c  |  4 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109615.c| 33 
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
>  4 files changed, 54 insertions(+), 66 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109615.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 609f86d8704..39b4d21210b 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1676,72 +1676,27 @@ avl_info::single_source_equal_p (const avl_info 
> &other) const
>  bool
>  avl_info::multiple_source_equal_p (const avl_info &other) const
>  {
> -  /* TODO: We don't do too much optimization here since it's
> - too complicated in case of analyzing the PHI node.
> +  /* When the def info is same in RTL_SSA namespace, it's safe
> + to consider they are avl compatible.  */
> +  if (m_source == other.get_source ())
> +return true;
>
> - For example:
> -   void f (void * restrict in, void * restrict out, int n, int m, int 
> cond)
> -   {
> - size_t vl;
> - switch (cond)
> - {
> - case 1:
> -   vl = 100;
> -   break;
> - case 2:
> -   vl = *(size_t*)(in + 100);
> -   break;
> - case 3:
> -   {
> -

Re: [PATCH V3] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread Kito Cheng via Gcc-patches

Just one minor comment otherwise LGTM.

> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 45a63cab9c9..1a35e02796d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3793,6 +3793,14 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
> CUMULATIVE_ARGS *cum,
>
>if (named)
>  {
> +  /* TODO: Currently, it will produce ICE for --param
> +riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX 
> here
> +let GCC genearte loads/stores. Ideally, GCC should either report
> +Warning message to tell user do not use RVV vector type in function
> +arg, or GCC just support function arg calling convention for RVV
> +directly.  */
> +  if (riscv_v_ext_mode_p (mode))
> +   return NULL_RTX;

Plz move this outside the if block become something like:

/* Your comment here.  */
if (riscv_v_ext_mode_p (mode))
  return NULL_RTX;
if (named)
  {


>riscv_aggregate_field fields[2];
>unsigned fregno = fpr_base + info->fpr_offset;
>unsigned gregno = gpr_base + info->gpr_offset;

Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-05 Thread Jakub Jelinek via Gcc-patches

On Fri, May 05, 2023 at 07:38:45AM -0400, Jason Merrill wrote:
> On 5/5/23 06:45, Jakub Jelinek wrote:
> > +  if (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))
> > +{
> > +  /* Return true for const qualified vars, but for members or array
> > +elements without side-effects return true only if the base
> > +object is a decl.  If the base is e.g. a pointer dereference,
> > +what the pointer points to could be deallocated or the pointer
> > +could be changed.  See PR52339.  */
> > +  tree base = get_base_address (t);
> > +  if (DECL_P (base))
> > +   return true;
> 
> So I think the above is correct.

Ok, will test it with testsuite adjustments for the Ada testcases.
See below.

> > +  /* As an exception, allow pointer dereferences as long as the pointer
> > +is invariant.  */
> > +  if (TREE_CODE (base) == INDIRECT_REF
> > + && tree_invariant_p_1 (get_base_address (TREE_OPERAND (base, 0
> > +   return true;
> 
> And this is unsafe.

Ok, idea withdrawn.

Had further look at the vect6.adb case, but I think it is for the Ada people
to analyze.

The *.original dump differences there are as I said instead of using
r.P_BOUNDS->LB0
r.P_BOUNDS->UB0
x.P_BOUNDS->LB0
x.P_BOUNDS->UB0
wrap those into SAVE_EXPR in various places (that is the expected part,
that is what the patch was about), but also:
-SAVE_EXPR LB0 < r.P_BOUNDS->LB0 || x.P_BOUNDS->UB0 > 
r.P_BOUNDS->UB0>;
 <<< Unknown tree: loop_stmt
   I.0 != (unsigned long) vect6__add__L_1__T3b___U
   I.0 = I.0 + 1;
   i = (vect6_pkg__index_type) I.0;
-  if ((SAVE_EXPR LB0 < r.P_BOUNDS->LB0 || x.P_BOUNDS->UB0 
> r.P_BOUNDS->UB0>) && .BUILTIN_EXPECT (r.P_BOUNDS->LB0 > i || r.P_BOUNDS->UB0 
< i, 0, 15))
+  if (SAVE_EXPR LB0> > i || SAVE_EXPR UB0> < 
i)
 {
   .gnat_rcheck_CE_Index_Check ("vect6.adb", 9);
 }
So, if the {x,r}.P_BOUNDS->{U,B}B0 expressions aren't wrapped into
SAVE_EXPRs, something in the FE decides to evaluate
x.P_BOUNDS->LB0 < r.P_BOUNDS->LB0 || x.P_BOUNDS->UB0 > r.P_BOUNDS->UB0
expression into a temporary before the loop and && the bounds condition
inside of the loop with it, while with the patch that doesn't happen.
And, that turns out in loop unswitching being successful without my patch
and not with my patch, where we can vectorize the unswitched loop without
the .gnat_rcheck_CE_Index_Check call.

Perhaps ada/gcc-interface/utils2.cc (gnat_invariant_expr) could be taught
to handle SAVE_EXPR by looking at its operand?
--- gcc/ada/gcc-interface/utils2.cc.jj  2023-01-16 23:19:05.539727388 +0100
+++ gcc/ada/gcc-interface/utils2.cc 2023-05-05 15:37:30.193990948 +0200
@@ -3332,6 +3332,7 @@ gnat_invariant_expr (tree expr)
case IMAGPART_EXPR:
case VIEW_CONVERT_EXPR:
CASE_CONVERT:
+   case SAVE_EXPR:
  break;
 
case INDIRECT_REF:
fixes the vect{1,2,3,4,5,6}.adb regressions but not the
loop_optimization21.adb one.  But I'm afraid I really have no idea what
that code is doing.

2023-05-05  Jakub Jelinek  

PR c++/52339
* tree.cc (tree_invariant_p_1): For TREE_READONLY (t) without
side-effects, only return true if DECL_P (get_base_address (t)).

* g++.dg/opt/pr52339.C: New test.
* gcc.c-torture/execute/pr52339-1.c: New test.
* gcc.c-torture/execute/pr52339-2.c: New test.
* gnat.dg/loop_optimization21.adb: Adjust expected match count.
* gnat.dg/vect1.adb: Likewise.
* gnat.dg/vect2.adb: Likewise.
* gnat.dg/vect3.adb: Likewise.
* gnat.dg/vect4.adb: Likewise.
* gnat.dg/vect5.adb: Likewise.
* gnat.dg/vect6.adb: Likewise.

--- gcc/tree.cc.jj  2023-05-01 09:59:46.686293833 +0200
+++ gcc/tree.cc 2023-05-05 10:19:19.061827355 +0200
@@ -3876,10 +3876,21 @@ tree_invariant_p_1 (tree t)
 {
   tree op;
 
-  if (TREE_CONSTANT (t)
-  || (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t)))
+  if (TREE_CONSTANT (t))
 return true;
 
+  if (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))
+{
+  /* Return true for const qualified vars, but for members or array
+elements without side-effects return true only if the base
+object is a decl.  If the base is e.g. a pointer dereference,
+what the pointer points to could be deallocated or the pointer
+could be changed.  See PR52339.  */
+  tree base = get_base_address (t);
+  if (DECL_P (base))
+   return true;
+}
+
   switch (TREE_CODE (t))
 {
 case SAVE_EXPR:
--- gcc/testsuite/g++.dg/opt/pr52339.C.jj   2023-05-04 15:23:20.459935705 
+0200
+++ gcc/testsuite/g++.dg/opt/pr52339.C  2023-05-04 15:22:35.640578681 +0200
@@ -0,0 +1,19 @@
+// PR c++/52339
+// { dg-do run { target c++11 } }
+
+
+struct B;
+struct A { B *b; };
+struct B {
+  A *a;
+  B () : a(new A{this}) {}
+  ~B () { delete a; }
+};
+ 
+int
+main ()
+{
+  B *b = new B;
+  const A *a = b->a;
+  delete a->b;
+}
--- gcc/testsuit

[PATCH] RISC-V: Fix PR109748

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is fixing my recent optimization patch:
https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf

In that patch, the new_info = parse_insn (i) is not correct.
Since consider the following case:

vsetvli a5,a4, e8,m1
..
vsetvli zero,a5, e32, m4
vle8.v
vmacc.vv
...

Since we have backward demand fusion in Phase 1, so the real demand of "vle8.v" 
is e32, m4.
However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.

So this patch we change new_info = new_info.parse_insn (i)
into:

vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];

So that, we can correctly optimize codes into:

vsetvli a5,a4, e32, m4
..
.. (vsetvli zero,a5, e32, m4 is removed)
vle8.v
vmacc.vv

Since m_vector_manager->vector_insn_infos is the member variable of pass_vsetvl 
class.
We remove static void function "local_eliminate_vsetvl_insn", and make it as 
the member function
of pass_vsetvl class.

PR target/109748

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove it.
(pass_vsetvl::local_eliminate_vsetvl_insn): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 102 ++
 .../gcc.target/riscv/rvv/vsetvl/pr109748.c|  36 +++
 2 files changed, 93 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109748.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 39b4d21210b..e1efd7b1c40 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1056,51 +1056,6 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info &info)
   change_insn (rinsn, new_pat);
 }
 
-static void
-local_eliminate_vsetvl_insn (const vector_insn_info &dem)
-{
-  const insn_info *insn = dem.get_insn ();
-  if (!insn || insn->is_artificial ())
-return;
-  rtx_insn *rinsn = insn->rtl ();
-  const bb_info *bb = insn->bb ();
-  if (vsetvl_insn_p (rinsn))
-{
-  rtx vl = get_vl (rinsn);
-  for (insn_info *i = insn->next_nondebug_insn ();
-  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
-   {
- if (i->is_call () || i->is_asm ()
- || find_access (i->defs (), VL_REGNUM)
- || find_access (i->defs (), VTYPE_REGNUM))
-   return;
-
- if (has_vtype_op (i->rtl ()))
-   {
- if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
-   return;
- rtx avl = get_avl (i->rtl ());
- if (avl != vl)
-   return;
- set_info *def = find_access (i->uses (), REGNO (avl))->def ();
- if (def->insn () != insn)
-   return;
-
- vector_insn_info new_info;
- new_info.parse_insn (i);
- if (!new_info.skip_avl_compatible_p (dem))
-   return;
-
- new_info.set_avl_info (dem.get_avl_info ());
- new_info = dem.merge (new_info, LOCAL_MERGE);
- change_vsetvl_insn (insn, new_info);
- eliminate_insn (PREV_INSN (i->rtl ()));
- return;
-   }
-   }
-}
-}
-
 static bool
 source_equal_p (insn_info *insn1, insn_info *insn2)
 {
@@ -2672,6 +2627,7 @@ private:
   void pre_vsetvl (void);
 
   /* Phase 5.  */
+  void local_eliminate_vsetvl_insn (const vector_insn_info &) const;
   void cleanup_insns (void) const;
 
   /* Phase 6.  */
@@ -3993,6 +3949,62 @@ pass_vsetvl::pre_vsetvl (void)
 commit_edge_insertions ();
 }
 
+/* Local user vsetvl optimizaiton:
+
+ Case 1:
+   vsetvl a5,a4,e8,mf8
+   ...
+   vsetvl zero,a5,e8,mf8 --> Eliminate directly.
+
+ Case 2:
+   vsetvl a5,a4,e8,mf8--> vsetvl a5,a4,e32,mf2
+   ...
+   vsetvl zero,a5,e32,mf2 --> Eliminate directly.  */
+void
+pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const
+{
+  const insn_info *insn = dem.get_insn ();
+  if (!insn || insn->is_artificial ())
+return;
+  rtx_insn *rinsn = insn->rtl ();
+  const bb_info *bb = insn->bb ();
+  if (vsetvl_insn_p (rinsn))
+{
+  rtx vl = get_vl (rinsn);
+  for (insn_info *i = insn->next_nondebug_insn ();
+  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
+   {
+ if (i->is_call () || i->is_asm ()
+ || find_access (i->defs (), VL_REGNUM)
+ || find_access (i->defs (), VTYPE_REGNUM))
+   return;
+
+ if (has_vtype_op (i->rtl ()))
+   {
+ if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
+   return;
+ rtx avl = get_avl (i->rtl ());
+ if (avl != vl)
+   return;
+ set_info *def = find_access (i->uses (), REGNO (avl))->def ();
+ if (def->insn () != insn)
+   return;

Re: [PATCH] RISC-V: Fix PR109748

2023-05-05 Thread Kito Cheng via Gcc-patches

Plz re-titile with some description rather than `Fix PR` :)

On Fri, May 5, 2023 at 9:52 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is fixing my recent optimization patch:
> https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf
>
> In that patch, the new_info = parse_insn (i) is not correct.
> Since consider the following case:
>
> vsetvli a5,a4, e8,m1
> ..
> vsetvli zero,a5, e32, m4
> vle8.v
> vmacc.vv
> ...
>
> Since we have backward demand fusion in Phase 1, so the real demand of 
> "vle8.v" is e32, m4.
> However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.
>
> So this patch we change new_info = new_info.parse_insn (i)
> into:
>
> vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];
>
> So that, we can correctly optimize codes into:
>
> vsetvli a5,a4, e32, m4
> ..
> .. (vsetvli zero,a5, e32, m4 is removed)
> vle8.v
> vmacc.vv
>
> Since m_vector_manager->vector_insn_infos is the member variable of 
> pass_vsetvl class.
> We remove static void function "local_eliminate_vsetvl_insn", and make it as 
> the member function
> of pass_vsetvl class.
>
> PR target/109748
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove 
> it.
> (pass_vsetvl::local_eliminate_vsetvl_insn): New function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  | 102 ++
>  .../gcc.target/riscv/rvv/vsetvl/pr109748.c|  36 +++
>  2 files changed, 93 insertions(+), 45 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109748.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 39b4d21210b..e1efd7b1c40 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1056,51 +1056,6 @@ change_vsetvl_insn (const insn_info *insn, const 
> vector_insn_info &info)
>change_insn (rinsn, new_pat);
>  }
>
> -static void
> -local_eliminate_vsetvl_insn (const vector_insn_info &dem)
> -{
> -  const insn_info *insn = dem.get_insn ();
> -  if (!insn || insn->is_artificial ())
> -return;
> -  rtx_insn *rinsn = insn->rtl ();
> -  const bb_info *bb = insn->bb ();
> -  if (vsetvl_insn_p (rinsn))
> -{
> -  rtx vl = get_vl (rinsn);
> -  for (insn_info *i = insn->next_nondebug_insn ();
> -  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
> -   {
> - if (i->is_call () || i->is_asm ()
> - || find_access (i->defs (), VL_REGNUM)
> - || find_access (i->defs (), VTYPE_REGNUM))
> -   return;
> -
> - if (has_vtype_op (i->rtl ()))
> -   {
> - if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
> -   return;
> - rtx avl = get_avl (i->rtl ());
> - if (avl != vl)
> -   return;
> - set_info *def = find_access (i->uses (), REGNO (avl))->def ();
> - if (def->insn () != insn)
> -   return;
> -
> - vector_insn_info new_info;
> - new_info.parse_insn (i);
> - if (!new_info.skip_avl_compatible_p (dem))
> -   return;
> -
> - new_info.set_avl_info (dem.get_avl_info ());
> - new_info = dem.merge (new_info, LOCAL_MERGE);
> - change_vsetvl_insn (insn, new_info);
> - eliminate_insn (PREV_INSN (i->rtl ()));
> - return;
> -   }
> -   }
> -}
> -}
> -
>  static bool
>  source_equal_p (insn_info *insn1, insn_info *insn2)
>  {
> @@ -2672,6 +2627,7 @@ private:
>void pre_vsetvl (void);
>
>/* Phase 5.  */
> +  void local_eliminate_vsetvl_insn (const vector_insn_info &) const;
>void cleanup_insns (void) const;
>
>/* Phase 6.  */
> @@ -3993,6 +3949,62 @@ pass_vsetvl::pre_vsetvl (void)
>  commit_edge_insertions ();
>  }
>
> +/* Local user vsetvl optimizaiton:
> +
> + Case 1:
> +   vsetvl a5,a4,e8,mf8
> +   ...
> +   vsetvl zero,a5,e8,mf8 --> Eliminate directly.
> +
> + Case 2:
> +   vsetvl a5,a4,e8,mf8--> vsetvl a5,a4,e32,mf2
> +   ...
> +   vsetvl zero,a5,e32,mf2 --> Eliminate directly.  */
> +void
> +pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const
> +{
> +  const insn_info *insn = dem.get_insn ();
> +  if (!insn || insn->is_artificial ())
> +return;
> +  rtx_insn *rinsn = insn->rtl ();
> +  const bb_info *bb = insn->bb ();
> +  if (vsetvl_insn_p (rinsn))
> +{
> +  rtx vl = get_vl (rinsn);
> +  for (insn_info *i = insn->next_nondebug_insn ();
> +  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
> +   {
> + if (i->is_call () || i->is_asm ()
> + || find_access (i->defs (), VL_REGNUM)
> + || find_access (i->defs (), VTYPE_REGNUM))
> +   return;
> +
> +

[PATCH V4] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is depending on 
https://patchwork.sourceware.org/project/gcc/patch/20230504054544.203366-1-juzhe.zh...@rivai.ai/
Fix codes according to comments of Kito.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_arg_info): Move RVV type argument 
handling outside.

---
 gcc/config/riscv/riscv.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1a35e02796d..8d3cd4261d2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3791,16 +3791,16 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   info->gpr_offset = cum->num_gprs;
   info->fpr_offset = cum->num_fprs;
 
+  /* TODO: Currently, it will produce ICE for --param
+ riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
+ let GCC genearte loads/stores. Ideally, GCC should either report
+ Warning message to tell user do not use RVV vector type in function
+ arg, or GCC just support function arg calling convention for RVV
+ directly.  */
+  if (riscv_v_ext_mode_p (mode))
+return NULL_RTX;
   if (named)
 {
-  /* TODO: Currently, it will produce ICE for --param
-riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
-let GCC genearte loads/stores. Ideally, GCC should either report
-Warning message to tell user do not use RVV vector type in function
-arg, or GCC just support function arg calling convention for RVV
-directly.  */
-  if (riscv_v_ext_mode_p (mode))
-   return NULL_RTX;
   riscv_aggregate_field fields[2];
   unsigned fregno = fpr_base + info->fpr_offset;
   unsigned gregno = gpr_base + info->gpr_offset;
-- 
2.36.3

[PATCH V2] RISC-V: Fix incorrect demand info merge in local vsetvli optimization [PR109748]

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch is fixing my recent optimization patch:
https://github.com/gcc-mirror/gcc/commit/d51f2456ee51bd59a79b4725ca0e488c25260bbf

In that patch, the new_info = parse_insn (i) is not correct.
Since consider the following case:
   
vsetvli a5,a4, e8,m1
..
vsetvli zero,a5, e32, m4
vle8.v
vmacc.vv
...

Since we have backward demand fusion in Phase 1, so the real demand of "vle8.v" 
is e32, m4.
However, if we use parse_insn (vle8.v) = e8, m1 which is not correct.

So this patch we change new_info = new_info.parse_insn (i)
into:

vector_insn_info new_info = m_vector_manager->vector_insn_infos[i->uid ()];

So that, we can correctly optimize codes into:

vsetvli a5,a4, e32, m4
..
.. (vsetvli zero,a5, e32, m4 is removed)
vle8.v
vmacc.vv

Since m_vector_manager->vector_insn_infos is the member variable of pass_vsetvl 
class.
We remove static void function "local_eliminate_vsetvl_insn", and make it as 
the member function
of pass_vsetvl class.

PR target/109748

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): Remove it.
(pass_vsetvl::local_eliminate_vsetvl_insn): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109748.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 102 ++
 .../gcc.target/riscv/rvv/vsetvl/pr109748.c|  36 +++
 2 files changed, 93 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109748.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 39b4d21210b..e1efd7b1c40 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1056,51 +1056,6 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info &info)
   change_insn (rinsn, new_pat);
 }
 
-static void
-local_eliminate_vsetvl_insn (const vector_insn_info &dem)
-{
-  const insn_info *insn = dem.get_insn ();
-  if (!insn || insn->is_artificial ())
-return;
-  rtx_insn *rinsn = insn->rtl ();
-  const bb_info *bb = insn->bb ();
-  if (vsetvl_insn_p (rinsn))
-{
-  rtx vl = get_vl (rinsn);
-  for (insn_info *i = insn->next_nondebug_insn ();
-  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
-   {
- if (i->is_call () || i->is_asm ()
- || find_access (i->defs (), VL_REGNUM)
- || find_access (i->defs (), VTYPE_REGNUM))
-   return;
-
- if (has_vtype_op (i->rtl ()))
-   {
- if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
-   return;
- rtx avl = get_avl (i->rtl ());
- if (avl != vl)
-   return;
- set_info *def = find_access (i->uses (), REGNO (avl))->def ();
- if (def->insn () != insn)
-   return;
-
- vector_insn_info new_info;
- new_info.parse_insn (i);
- if (!new_info.skip_avl_compatible_p (dem))
-   return;
-
- new_info.set_avl_info (dem.get_avl_info ());
- new_info = dem.merge (new_info, LOCAL_MERGE);
- change_vsetvl_insn (insn, new_info);
- eliminate_insn (PREV_INSN (i->rtl ()));
- return;
-   }
-   }
-}
-}
-
 static bool
 source_equal_p (insn_info *insn1, insn_info *insn2)
 {
@@ -2672,6 +2627,7 @@ private:
   void pre_vsetvl (void);
 
   /* Phase 5.  */
+  void local_eliminate_vsetvl_insn (const vector_insn_info &) const;
   void cleanup_insns (void) const;
 
   /* Phase 6.  */
@@ -3993,6 +3949,62 @@ pass_vsetvl::pre_vsetvl (void)
 commit_edge_insertions ();
 }
 
+/* Local user vsetvl optimizaiton:
+
+ Case 1:
+   vsetvl a5,a4,e8,mf8
+   ...
+   vsetvl zero,a5,e8,mf8 --> Eliminate directly.
+
+ Case 2:
+   vsetvl a5,a4,e8,mf8--> vsetvl a5,a4,e32,mf2
+   ...
+   vsetvl zero,a5,e32,mf2 --> Eliminate directly.  */
+void
+pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const
+{
+  const insn_info *insn = dem.get_insn ();
+  if (!insn || insn->is_artificial ())
+return;
+  rtx_insn *rinsn = insn->rtl ();
+  const bb_info *bb = insn->bb ();
+  if (vsetvl_insn_p (rinsn))
+{
+  rtx vl = get_vl (rinsn);
+  for (insn_info *i = insn->next_nondebug_insn ();
+  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
+   {
+ if (i->is_call () || i->is_asm ()
+ || find_access (i->defs (), VL_REGNUM)
+ || find_access (i->defs (), VTYPE_REGNUM))
+   return;
+
+ if (has_vtype_op (i->rtl ()))
+   {
+ if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
+   return;
+ rtx avl = get_avl (i->rtl ());
+ if (avl != vl)
+   return;
+ set_info *def = find_access (i->uses (), REGNO (avl))->def ();
+ if (def->insn () != insn)
+   return;
+

Re: [PATCH V3] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

I wasn't yet able to check this locally so just some minor comment nits:

> +/* Return the vectorization machine mode for RVV according to LMUL.  */
> +machine_mode
> +preferred_simd_mode (scalar_mode mode)
> +{
> +  /* We only enable auto-vectorization when TARGET_MIN_VLEN < 128 &&
> + riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE when 
> we
> + enable -march=rv64gc_zve32* and -march=rv32gc_zve64*. in the

I believe Kito mentioned this in the last iteration but the comment
here doesn't match the code below.  You want >= 128 instead of < 128.

> +  /* TODO: Currently, it will produce ICE for --param
> +  riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
> +  let GCC genearte loads/stores. Ideally, GCC should either report
> +  Warning message to tell user do not use RVV vector type in function
> +  arg, or GCC just support function arg calling convention for RVV
> +  directly.  */
> +  if (riscv_v_ext_mode_p (mode))
> + return NULL_RTX;

will produce -> will cause an ICE

genearte -> generate

GCC should either... -> we should either warn the user not to use
an RVV vector type as function argument ... or support the calling
convention

> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -mpreferred-stack-boundary=3 
> -fno-schedule-insns -fno-schedule-insns2 -O3 --param 
> riscv-autovec-preference=fixed-vlmax" } */
> +
> +#include "riscv_vector.h"
> +
> +void f (char*);
> +
> +void stach_check_alloca_1 (vuint8m1_t data, uint8_t *base, int y, ...)
> +{

Shouldn't that be stack rather than stach?

> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h
> @@ -0,0 +1,106 @@
> +#include 
> +#include 
> +
> +#define N 777
> +
> +#define test_1(TYPE) 
>   \
> +  TYPE a_##TYPE[N];  
>   \
> +  TYPE b_##TYPE[N];  
>   \
> +  void __attribute__ ((noinline, noclone)) test_1_##TYPE (unsigned int n)
>   \

Just FYI, you can use ((noipa)) to cover all cases of unwanted "inlining".  Not
needed here though.

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> new file mode 100644
> index 000..7ff84f60749
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -O3 -ftree-vectorize 
> -fdump-tree-vect-details -save-temps" } */
> +
> +#include "template-1.h"

I'm a bit wary of these tests not checking anything.  Of course we will see
if we ICE or not but that I would expect from a "new feature".  Couldn't we 
check
something else at least that gives a clue as to what is supposed to happen?
Last time I tried some of those locally, we would not vectorize.  In case that's
intended we could check for e.g. "vectorized 0 loops in function".  If not, a
comment would still help.

Do we actually need -ftree-vectorize at -O3?  In general I would prefer to 
split off
common options and set them in rvv.exp already, only giving 
dg-additional-options for
each test.  Here we don't share too many apart from -O3 -ftree-vectorize so not 
yet
necessary.

Regards
 Robin

Re: [PATCH v2] RISC-V: Legitimise the const0_rtx for RVV indexed load/store

2023-05-05 Thread Kito Cheng via Gcc-patches

pushed to trunk, thanks :)

On Thu, May 4, 2023 at 5:12 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch try to legitimise the const0_rtx (aka zero register)
> as the base register for the RVV indexed load/store instructions
> by allowing the const as the operand of the indexed RTL pattern.
> Then the underlying combine pass will try to perform the const
> propagation.
>
> For example:
> vint32m1_t
> test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl)
> {
>   return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl);
> }
>
> Before this patch:
> li a5,0 <- can be eliminated.
> vl1re32.v  v1,0(a1)
> vsetvlizero,a2,e32,m1,ta,ma
> vluxei32.v v1,(a5),v1   <- can propagate the const 0 to a5 here.
> vs1r.v v1,0(a0)
> ret
>
> After this patch:
> test_vluxei32_v_i32m1_shortcut:
> vl1re32.v   v1,0(a1)
> vsetvli zero,a2,e32,m1,ta,ma
> vluxei32.v  v1,(0),v1
> vs1r.v  v1,0(a0)
> ret
>
> As above, this patch allow you to propagaate the const 0 (aka zero
> register) to the base register of the RVV indexed load in the combine
> pass. This may benefit the underlying RVV auto-vectorization.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Allow const as the operand of RVV
>   indexed load/store.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
>   Adjust indexed load/store check condition.
>
> Signed-off-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> ---
>  gcc/config/riscv/vector.md| 62 +--
>  .../base/zero_base_load_store_optimization.c  |  3 +-
>  2 files changed, 33 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 92115e3935f..dc05e9fc713 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1511,12 +1511,12 @@ (define_insn 
> "@pred_indexed_load_same_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:V
> -   [(match_operand 3 "pmode_register_operand""  r,  r, r,  r")
> +   [(match_operand 3 "pmode_reg_or_0_operand"" rJ, rJ,rJ, rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" " vr, vr,vr, vr")] 
> ORDER)
>   (match_operand:V 2 "vector_merge_operand"   " vu, vu, 0,  0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1533,12 +1533,12 @@ (define_insn 
> "@pred_indexed_load_x2_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT2
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT2 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1554,12 +1554,12 @@ (define_insn 
> "@pred_indexed_load_x4_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT4
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand"   "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT4 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1575,12 +1575,12 @@ (define_insn 
> "@pred_indexed_load_x8_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT8
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand""   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT8 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1597,12 +1597,12 @@ (define_insn 
> "@pred_indexed_load_x2_smaller_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWTRUNC2
> -   [(match_operand 3 "

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Kito Cheng via Gcc-patches

pushed v1 to trunk

On Fri, May 5, 2023 at 8:46 PM Li, Pan2 via Gcc-patches
 wrote:
>
> Ok, sounds good. Thank you!
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Friday, May 5, 2023 8:37 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> I will take V1 and commit to trunk after my local test is done :)
>
> On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
> >
> > Hi kito,
> >
> > Could you please help to share any suggestion about the PATCH? Comparing 
> > the V1 and V2.
> >
> > Pan
> >
> >
> > -Original Message-
> > From: Li, Pan2
> > Sent: Wednesday, May 3, 2023 7:18 PM
> > To: Jeff Law ; Kito Cheng
> > 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > ; Andrew Waterman 
> > Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> > Thanks all for comments, will work with kito to make it happen.
> >
> > Pan
> >
> > -Original Message-
> > From: Jeff Law 
> > Sent: Wednesday, May 3, 2023 12:28 AM
> > To: Kito Cheng 
> > Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org;
> > juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew
> > Waterman 
> > Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> >
> >
> > On 4/29/23 19:40, Kito Cheng wrote:
> > > Hi Jeff:
> > >
> > > The RTL pattern already models tail element and vector length well,
> > > so I don't feel the first version of Pan's patch has any problem?
> > >
> > > Input RTL pattern:
> > >
> > > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > > #(const_vector:VNx2BI repeat [
> > > #(const_int 1 [0x1])
> > > #])  # all-1 mask
> > > #(reg:DI 143)  # AVL reg, or vector length
> > > #(const_int 2 [0x2]) # mask policy
> > > #(const_int 0 [0])   # avl type
> > > #(reg:SI 66 vl)
> > > #(reg:SI 67 vtype)
> > > #] UNSPEC_VPREDICATE)
> > > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > > #(reg/v:VNx2QI 137 [ v1 ]))
> > > #(unspec:VNx2BI [
> > > #(reg:SI 0 zero)
> > > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > > # (expr_list:REG_DEAD (reg:DI 143)
> > > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > > #(nil
> > >
> > > And the split pattern, only did on tail/maskoff element with undefined 
> > > value:
> > >
> > > (define_split
> > >   [(set (match_operand:VB  0 "register_operand")
> > > (if_then_else:VB
> > >   (unspec:VB
> > > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> > >  (match_operand4 "vector_length_operand")
> > >  (match_operand5 "const_int_operand")
> > >  (match_operand6 "const_int_operand")
> > >  (reg:SI VL_REGNUM)
> > >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> > >   (match_operand:VB3 "vector_move_operand")
> > >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > > and tail operand, only match undef value
> > >
> > > Then it turns into vmset, and also discard mask policy operand
> > > (since maskoff is undef means don't care IMO):
> > >
> > > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > (if_then_else:VNx2BI (unspec:VNx2BI [
> > > (const_vector:VNx2BI repeat [
> > > (const_int 1 [0x1])
> > > ])  # all-1 mask
> > > (reg:DI 143) # AVL reg, or vector length
> > > (const_int 2 [0x2]) # mask policy
> > > (reg:SI 66 vl)
> > > (reg:SI 67 vtype)
> > > ] UNSPEC_VPREDICATE)
> > > (const_vector:VNx2BI repeat [
> > > (const_int 1 [0x1])
> > > ])# all-1
> > > (unspec:VNx2BI [
> > > (reg:SI 0 zero)
> > > ] UNSPEC_VUNDEF))) # still vundef
> > >  (expr_list:REG_DEAD (reg:DI 143)
> > > (nil)))
> > Right.  My concern is that when we call relational_result it's going to 
> > return -1 (as a vector of bools) which bubbles up through the call
> > chain.   If that doesn't match the actual register state after the
> > instruction (irrespective of the tail policy), then we have the potential 
> > to generate incorrect code.
> >
> > For example, if there's a subsequent instruction that tried to set a vector 
> > register to -1, it could just copy from the destination of the vmset to the 
> > new target.  But if the vmset didn't set all the bits to 1, then the code 
> > is wrong.
> >
> > With all the UNSPECs in place, this may not be a problem in practice.
> > Un

RE: [PATCH v2] RISC-V: Legitimise the const0_rtx for RVV indexed load/store

2023-05-05 Thread Li, Pan2 via Gcc-patches

Thank you!

-Original Message-
From: Kito Cheng  
Sent: Friday, May 5, 2023 10:52 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Legitimise the const0_rtx for RVV indexed 
load/store

pushed to trunk, thanks :)

On Thu, May 4, 2023 at 5:12 PM Pan Li via Gcc-patches  
wrote:
>
> From: Pan Li 
>
> This patch try to legitimise the const0_rtx (aka zero register) as the 
> base register for the RVV indexed load/store instructions by allowing 
> the const as the operand of the indexed RTL pattern.
> Then the underlying combine pass will try to perform the const 
> propagation.
>
> For example:
> vint32m1_t
> test_vluxei32_v_i32m1_shortcut (vuint32m1_t bindex, size_t vl) {
>   return __riscv_vluxei32_v_i32m1 ((int32_t *)0, bindex, vl); }
>
> Before this patch:
> li a5,0 <- can be eliminated.
> vl1re32.v  v1,0(a1)
> vsetvlizero,a2,e32,m1,ta,ma
> vluxei32.v v1,(a5),v1   <- can propagate the const 0 to a5 here.
> vs1r.v v1,0(a0)
> ret
>
> After this patch:
> test_vluxei32_v_i32m1_shortcut:
> vl1re32.v   v1,0(a1)
> vsetvli zero,a2,e32,m1,ta,ma
> vluxei32.v  v1,(0),v1
> vs1r.v  v1,0(a0)
> ret
>
> As above, this patch allow you to propagaate the const 0 (aka zero
> register) to the base register of the RVV indexed load in the combine 
> pass. This may benefit the underlying RVV auto-vectorization.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Allow const as the operand of RVV
>   indexed load/store.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c:
>   Adjust indexed load/store check condition.
>
> Signed-off-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> ---
>  gcc/config/riscv/vector.md| 62 +--
>  .../base/zero_base_load_store_optimization.c  |  3 +-
>  2 files changed, 33 insertions(+), 32 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index 92115e3935f..dc05e9fc713 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1511,12 +1511,12 @@ (define_insn 
> "@pred_indexed_load_same_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:V
> -   [(match_operand 3 "pmode_register_operand""  r,  r, r,  r")
> +   [(match_operand 3 "pmode_reg_or_0_operand"" rJ, rJ,rJ, rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" " vr, vr,vr, vr")] 
> ORDER)
>   (match_operand:V 2 "vector_merge_operand"   " vu, vu, 0,  0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1533,12 +1533,12 @@ (define_insn 
> "@pred_indexed_load_x2_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT2
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand" "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT2 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1554,12 +1554,12 @@ (define_insn 
> "@pred_indexed_load_x4_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT4
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand"   "   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT4 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(set_attr "type" "vldx")
> (set_attr "mode" "")])
>
> @@ -1575,12 +1575,12 @@ (define_insn 
> "@pred_indexed_load_x8_greater_eew"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (unspec:VEEWEXT8
> -   [(match_operand 3 "pmode_register_operand" "
> r,r")
> +   [(match_operand 3 "pmode_reg_or_0_operand" "   
> rJ,   rJ")
>  (mem:BLK (scratch))
>  (match_operand: 4 "register_operand""   
> vr,   vr")] ORDER)
>   (match_operand:VEEWEXT8 2 "vector_merge_operand" "   
> vu,0")))]
>"TARGET_VECTOR"
> -  "vlxei.v\t%0,(%3),%4%p1"
> +  "vlxei.v\t%0,(%z3),%4%p1"
>[(s

[PATCH V5] RISC-V: Enable basic RVV auto-vectorization support.

2023-05-05 Thread juzhe . zhong

From: Juzhe-Zhong 

Address comments from Robin.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (preferred_simd_mode): Fix comments.
* config/riscv/riscv.cc (riscv_get_arg_info): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix function name.
* gcc.target/riscv/rvv/autovec/v-1.c: Remove -O3 -ftree-vectorize.
* gcc.target/riscv/rvv/autovec/v-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add -O3 -ftree-vectorize.

---
 gcc/config/riscv/riscv-v.cc  | 2 +-
 gcc/config/riscv/riscv.cc| 9 -
 .../gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c | 4 +++-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/v-1.c | 7 ++-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/v-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32f-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve32x-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64d-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64f-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/zve64x-3.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp   | 2 +-
 31 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 82510743eb8..1f887f7e747 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -940,7 +940,7 @@ autovec_use_vlmax_p (void)
 machine_mode
 preferred_simd_mode (scalar_mode mode)
 {
-  /* We only enable auto-vectorization when TARGET_MIN_VLEN < 128 &&
+  /* We will disable auto-vectorization when TARGET_MIN_VLEN < 128 &&
  riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE when we
  enable -march=rv64gc_zve32* and -march=rv32gc_zve64*. in the
  'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8d3cd4261d2..aa985c2f456 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3791,12 +3791,11 @@ riscv_get_arg_info (struct riscv_arg_info *info, co

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread David Edelsohn via Gcc-patches

This patch has broken GCC bootstrap on AIX.  It appears to rely upon, or
complain about, the command "seq":

/nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstdc++
-static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genmatch \
build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
build/errors.o build/vec.o build/hash-table.o build/sort.o
../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
/usr/bin/bash: seq: command not found
/usr/bin/bash: seq: command not found
build/genmatch --gimple \
--header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
/nasfarm/edelsohn/src/src/gcc/match.pd

All of the match files are dumped to stdout.

Thanks, David

Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.

2023-05-05 Thread Christoph Müllner

On Wed, Apr 19, 2023 at 11:58 AM Jin Ma  wrote:
>
> This patch adds the 'Zfa' extension for riscv, which is based on:
>   https://github.com/riscv/riscv-isa-manual/commits/zfb
>   
> https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
>
> The binutils-gdb for 'Zfa' extension:
>   https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
>
> What needs special explanation is:
> 1, The immediate number of the instructions FLI.H/S/D is represented in the 
> assembly as a
>   floating-point value, with scientific counting when rs1 is 1,2, and decimal 
> numbers for
>   the rest.
>
>   Related llvm link:
> https://reviews.llvm.org/D145645
>   Related discussion link:
> https://github.com/riscv/riscv-isa-manual/issues/980
>
> 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added 
> principally to
>   accelerate the processing of JavaScript Numbers.", so it seems that no 
> implementation
>   is required.
>
> 3, The instructions FMINM and FMAXM correspond to C23 library function 
> fminimum and fmaximum.
>   Therefore, this patch has simply implemented the pattern of 
> fminm3 and
>   fmaxm3 to prepare for later.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zfa extension version.
> * config/riscv/constraints.md (Zf): Constrain the floating point 
> number that the
> instructions FLI.H/S/D can load.
> ((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable 
> FMVP.D.X and FMVH.X.D.
> * config/riscv/iterators.md (ceil): New.
> * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
> New.
> * config/riscv/riscv.cc (find_index_in_array): New.
> (riscv_float_const_rtx_index_for_fli): Get the index of the 
> floating-point number that
> the instructions FLI.H/S/D can mov.
> (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
> memory is not applicable.
> (riscv_const_insns): The cost of FLI.H/S/D is 3.
> (riscv_legitimize_const_move): Likewise.
> (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
> split is required.
> (riscv_output_move): Output the mov instructions in zfa extension.
> (riscv_print_operand): Output the floating-point value of the 
> FLI.H/S/D immediate in assembly
> (riscv_secondary_memory_needed): Likewise.
> * config/riscv/riscv.h (GP_REG_RTX_P): New.
> * config/riscv/riscv.md (fminm3): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
> * gcc.target/riscv/zfa-fleq-fltq.c: New test.
> * gcc.target/riscv/zfa-fli-rv32.c: New test.
> * gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
> * gcc.target/riscv/zfa-fli-zfh.c: New test.
> * gcc.target/riscv/zfa-fli.c: New test.
> * gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
> * gcc.target/riscv/zfa-fround-rv32.c: New test.
> * gcc.target/riscv/zfa-fround.c: New test.
> ---
>  gcc/common/config/riscv/riscv-common.cc   |   4 +
>  gcc/config/riscv/constraints.md   |  11 +-
>  gcc/config/riscv/iterators.md |   5 +
>  gcc/config/riscv/riscv-opts.h |   3 +
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv.cc | 168 +-
>  gcc/config/riscv/riscv.h  |   1 +
>  gcc/config/riscv/riscv.md | 112 +---
>  .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
>  .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 
>  .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 +
>  gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 +
>  gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 
>  .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 ++
>  .../gcc.target/riscv/zfa-fround-rv32.c|  42 +
>  gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 +
>  17 files changed, 652 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 309a52def75..f9fce6bcc38 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/risc

Re: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Jeff Law via Gcc-patches





On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:

This patch has broken GCC bootstrap on AIX.  It appears to rely upon, or
complain about, the command "seq":

/nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstdc++
-static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genmatch \
 build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
build/errors.o build/vec.o build/hash-table.o build/sort.o
../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
/usr/bin/bash: seq: command not found
/usr/bin/bash: seq: command not found
build/genmatch --gimple \
 --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
 /nasfarm/edelsohn/src/src/gcc/match.pd

All of the match files are dumped to stdout.
Sigh.  So the question is do we make seq a requirement or do we 
implement an alternate to get the sequence or implement a fallback.


jeff

1 2 >

1 - 100 of 190 matches

Mail list logo