Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

Richard Sandiford Wed, 06 Nov 2024 13:16:33 -0800

Thanks for doing this and sorry for the slow review.

Jennifer Schmitz <jschm...@nvidia.com> writes:
> If an SVE intrinsic has predicate pfalse, we can fold the call to
> a simplified assignment statement: For _m, _x, and implicit predication,
> the LHS can be assigned the operand for inactive values and for _z, we can
> assign a zero vector.


_x functions don't really specify the values of inactive lanes.
The values are instead "don't care" and so we have a free choice.
Using zeros (as for _z) should be more efficient than reusing one
of the inputs.

> For example,
> svint32_t foo (svint32_t op1, svint32_t op2)
> {
>   return svadd_s32_m (svpfalse_b (), op1, op2);
> }
> can be folded to lhs <- op1, such that foo is compiled to just a RET.
>
> We implemented this optimization during gimple folding by calling a new method
> function_shape::fold_pfalse from gimple_folder::fold.
> The implementations of fold_pfalse in the function_shape subclasses define the
> expected behavior for a pfalse predicate for the different predications
> and return an appropriate gimple statement.
> To avoid code duplication, function_shape::fold_pfalse calls a new method
> gimple_folder:fold_by_pred that takes arguments of type tree for each
> predication and returns the new assignment statement depending on the
> predication.
>
> We tested the new behavior for each intrinsic with all supported predications
> and data types and checked the produced assembly. There is a test file
> for each shape subclass with scan-assembler-times tests that look for
> the simplified instruction sequences, such as individual RET instructions
> or zeroing moves. There is an additional directive counting the total number 
> of
> functions in the test, which must be the sum of counts of all other
> directives. This is to check that all tested intrinsics were optimized.
>
> In this patch, we only implemented function_shape::fold_pfalse for
> binary shapes. But we plan to cover more shapes in follow-up patches,
> after getting feedback on this patch.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?

My main question is: can we not do this generically for _z, _x and _m,
without the virtual method?  (I wanted to prototype this locally before
asking, hence the delay, but never found time.)  The only snag is that
we need to check whether the inactive lanes for _m are specified using
a separate parameter before the predicate or whether they're taken from
the parameter after the predicate.  But that should be a generic rule
that can be checked by looking at the types.

That is, if the predication is PRED_z, PRED_m, or PRED_x, then I think
we can say that:

(1) if the first argument is a vector and the second argument is a pfalse
    predicate, the call folds to the first argument

(2) if the first argument is a pfalse predicate and predication is PRED_m,
    the call folds to the second argument

(3) if the first argument is a pfalse predicate and predication is
    not PRED_m, the call folds to zero.

But perhaps I've forgotten a case.

The implicit cases would still need to be handled separately.
Perhaps for that case we could add a general fold method to the shape?
I don't think we need to specialise the name of the virtual function
beyond just "fold".

It would be good to put:

> +  gimple_seq stmts = NULL;
> +  gimple *g = gimple_build_assign (lhs, new_lhs);
> +  gimple_seq_add_stmt_without_update (&stmts, g);
> +  gsi_replace_with_seq_vops (gsi, stmts);
> +  return g;

into a helper so that we can use it for any g.  (Maybe with yet another
helper for folding to a gimple value, as here.)  Some of the existing
routines could use that helper too, so it would make a natural prepatch.

I think I've made that sound more complicated than it really is, sorry.
But I think the net effect should be less code, with automatic support
for unary operations.

Thanks,
Richard

> @@ -3666,6 +3683,39 @@ gimple_folder::fold_active_lanes_to (tree x)
>    return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive);
>  }
>  
> +/* Fold call to assignment statement
> +   lhs = new_lhs,
> +   where new_lhs is determined by the predication.
> +   Return the gimple statement on success, else return NULL.  */
> +gimple *
> +gimple_folder::fold_by_pred (tree m, tree x, tree z, tree implicit)
> +{
> +  tree new_lhs = NULL;
> +  switch (pred)
> +    {
> +    case PRED_z:
> +      new_lhs = z;
> +      break;
> +    case PRED_m:
> +      new_lhs = m;
> +      break;
> +    case PRED_x:
> +      new_lhs = x;
> +      break;
> +    case PRED_implicit:
> +      new_lhs = implicit;
> +      break;
> +    default:
> +      return NULL;
> +    }
> +  gcc_assert (new_lhs);
> +  gimple_seq stmts = NULL;
> +  gimple *g = gimple_build_assign (lhs, new_lhs);
> +  gimple_seq_add_stmt_without_update (&stmts, g);
> +  gsi_replace_with_seq_vops (gsi, stmts);
> +  return g;
> +}
> +
>  /* Try to fold the call.  Return the new statement on success and null
>     on failure.  */
>  gimple *
> @@ -3685,6 +3735,9 @@ gimple_folder::fold ()
>    /* First try some simplifications that are common to many functions.  */
>    if (auto *call = redirect_pred_x ())
>      return call;
> +  if (pred != PRED_none)
> +    if (auto *call = shape->fold_pfalse (*this))
> +      return call;
>  
>    return base->fold (*this);
>  }
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
> b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 4cdc0541bdc..4e443a8192e 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -632,12 +632,15 @@ public:
>    gcall *redirect_call (const function_instance &);
>    gimple *redirect_pred_x ();
>  
> +  bool arg_is_pfalse_p (unsigned int idx);
> +
>    gimple *fold_to_cstu (poly_uint64);
>    gimple *fold_to_pfalse ();
>    gimple *fold_to_ptrue ();
>    gimple *fold_to_vl_pred (unsigned int);
>    gimple *fold_const_binary (enum tree_code);
>    gimple *fold_active_lanes_to (tree);
> +  gimple *fold_by_pred (tree, tree, tree, tree);
>  
>    gimple *fold ();
>  
> @@ -796,6 +799,11 @@ public:
>    /* Check whether the given call is semantically valid.  Return true
>       if it is, otherwise report an error and return false.  */
>    virtual bool check (function_checker &) const { return true; }
> +
> +  /* For a pfalse predicate, try to fold the given gimple call.
> +     Return the new gimple statement on success, otherwise return null.  */
> +  virtual gimple *fold_pfalse (gimple_folder &) const { return NULL; }
> +
>  };
>  
>  /* RAII class for enabling enough SVE features to define the built-in
> @@ -829,6 +837,7 @@ extern tree acle_svprfop;
>  
>  bool vector_cst_all_same (tree, unsigned int);
>  bool is_ptrue (tree, unsigned int);
> +bool is_pfalse (tree);
>  const function_instance *lookup_fndecl (tree);
>  
>  /* Try to find a mode with the given mode_suffix_info fields.  Return the
> diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c 
> b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c
> new file mode 100644
> index 00000000000..3910ab36b6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c
> @@ -0,0 +1,176 @@
> +#include <arm_sve.h>
> +
> +#define MXZ(F, RTY, TY1, TY2)                                        \
> +  RTY F##_f (TY1 op1, TY2 op2)                                       \
> +  {                                                          \
> +    return sv##F (svpfalse_b (), op1, op2);                  \
> +  }
> +
> +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY)                   \
> +  MXZ (F##_##TY##_m, sv##RTY, sv##TYPE1, sv##TYPE2)          \
> +  MXZ (F##_##TY##_x, sv##RTY, sv##TYPE1, sv##TYPE2)
> +
> +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY)                    \
> +  MXZ (F##_##TY##_z, sv##RTY, sv##TYPE1, sv##TYPE2)
> +
> +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY)                  \
> +  PRED_MXv (F, RTY, TYPE1, TYPE2, TY)                                \
> +  PRED_Zv (F, RTY, TYPE1, TYPE2, TY)
> +
> +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY)                     \
> +  PRED_Zv (F, RTY, TYPE1, TYPE2, TY)                         \
> +  MXZ (F##_n_##TY##_z, sv##RTY, sv##TYPE1, TYPE2)
> +
> +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY)                   \
> +  PRED_MXv (F, RTY, TYPE1, TYPE2, TY)                                \
> +  MXZ (F##_n_##TY##_m, sv##RTY, sv##TYPE1, TYPE2)            \
> +  MXZ (F##_n_##TY##_x, sv##RTY, sv##TYPE1, TYPE2)            \
> +  PRED_Z (F, RTY, TYPE1, TYPE2, TY)
> +
> +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY)                      \
> +  MXZ (F##_##TY, sv##RTY, sv##TYPE1, sv##TYPE2)
> +
> +#define ALL_Q_INTEGER(F, P)                                  \
> +  PRED_##P (F, uint8_t, uint8_t, uint8_t, u8)                        \
> +  PRED_##P (F, int8_t, int8_t, int8_t, s8)
> +
> +#define ALL_Q_INTEGER_UINT(F, P)                             \
> +  PRED_##P (F, uint8_t, uint8_t, uint8_t, u8)                        \
> +  PRED_##P (F, int8_t, int8_t, uint8_t, s8)
> +
> +#define ALL_Q_INTEGER_INT(F, P)                                      \
> +  PRED_##P (F, uint8_t, uint8_t, int8_t, u8)                 \
> +  PRED_##P (F, int8_t, int8_t, int8_t, s8)
> +
> +#define ALL_H_INTEGER(F, P)                                  \
> +  PRED_##P (F, uint16_t, uint16_t, uint16_t, u16)            \
> +  PRED_##P (F, int16_t, int16_t, int16_t, s16)
> +
> +#define ALL_H_INTEGER_UINT(F, P)                             \
> +  PRED_##P (F, uint16_t, uint16_t, uint16_t, u16)            \
> +  PRED_##P (F, int16_t, int16_t, uint16_t, s16)
> +
> +#define ALL_H_INTEGER_INT(F, P)                                      \
> +  PRED_##P (F, uint16_t, uint16_t, int16_t, u16)             \
> +  PRED_##P (F, int16_t, int16_t, int16_t, s16)
> +
> +#define ALL_H_INTEGER_WIDE(F, P)                             \
> +  PRED_##P (F, uint16_t, uint16_t, uint8_t, u16)             \
> +  PRED_##P (F, int16_t, int16_t, int8_t, s16)
> +
> +#define ALL_S_INTEGER(F, P)                                  \
> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)            \
> +  PRED_##P (F, int32_t, int32_t, int32_t, s32)
> +
> +#define ALL_S_INTEGER_UINT(F, P)                             \
> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)            \
> +  PRED_##P (F, int32_t, int32_t, uint32_t, s32)
> +
> +#define ALL_S_INTEGER_INT(F, P)                                      \
> +  PRED_##P (F, uint32_t, uint32_t, int32_t, u32)             \
> +  PRED_##P (F, int32_t, int32_t, int32_t, s32)
> +
> +#define ALL_S_INTEGER_WIDE(F, P)                             \
> +  PRED_##P (F, uint32_t, uint32_t, uint16_t, u32)            \
> +  PRED_##P (F, int32_t, int32_t, int16_t, s32)
> +
> +#define ALL_D_INTEGER(F, P)                                  \
> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)            \
> +  PRED_##P (F, int64_t, int64_t, int64_t, s64)
> +
> +#define ALL_D_INTEGER_UINT(F, P)                             \
> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)            \
> +  PRED_##P (F, int64_t, int64_t, uint64_t, s64)
> +
> +#define ALL_D_INTEGER_INT(F, P)                                      \
> +  PRED_##P (F, uint64_t, uint64_t, int64_t, u64)             \
> +  PRED_##P (F, int64_t, int64_t, int64_t, s64)
> +
> +#define ALL_D_INTEGER_WIDE(F, P)                             \
> +  PRED_##P (F, uint64_t, uint64_t, uint32_t, u64)            \
> +  PRED_##P (F, int64_t, int64_t, int32_t, s64)
> +
> +#define SD_INTEGER_TO_UINT(F, P)                             \
> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)            \
> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)            \
> +  PRED_##P (F, uint32_t, int32_t, int32_t, s32)                      \
> +  PRED_##P (F, uint64_t, int64_t, int64_t, s64)
> +
> +#define BHS_UNSIGNED_UINT64(F, P)                            \
> +  PRED_##P (F, uint8_t, uint8_t, uint64_t, u8)                       \
> +  PRED_##P (F, uint16_t, uint16_t, uint64_t, u16)            \
> +  PRED_##P (F, uint32_t, uint32_t, uint64_t, u32)
> +
> +#define BHS_SIGNED_UINT64(F, P)                                      \
> +  PRED_##P (F, int8_t, int8_t, uint64_t, s8)                 \
> +  PRED_##P (F, int16_t, int16_t, uint64_t, s16)                      \
> +  PRED_##P (F, int32_t, int32_t, uint64_t, s32)
> +
> +#define ALL_UNSIGNED_UINT(F, P)                                      \
> +  PRED_##P (F, uint8_t, uint8_t, uint8_t, u8)                        \
> +  PRED_##P (F, uint16_t, uint16_t, uint16_t, u16)            \
> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)            \
> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)
> +
> +#define ALL_SIGNED_UINT(F, P)                                        \
> +  PRED_##P (F, int8_t, int8_t, uint8_t, s8)                  \
> +  PRED_##P (F, int16_t, int16_t, uint16_t, s16)                      \
> +  PRED_##P (F, int32_t, int32_t, uint32_t, s32)                      \
> +  PRED_##P (F, int64_t, int64_t, uint64_t, s64)
> +
> +#define ALL_FLOAT(F, P)                                              \
> +  PRED_##P (F, float16_t, float16_t, float16_t, f16)         \
> +  PRED_##P (F, float32_t, float32_t, float32_t, f32)         \
> +  PRED_##P (F, float64_t, float64_t, float64_t, f64)
> +
> +#define ALL_FLOAT_INT(F, P)                                  \
> +  PRED_##P (F, float16_t, float16_t, int16_t, f16)           \
> +  PRED_##P (F, float32_t, float32_t, int32_t, f32)           \
> +  PRED_##P (F, float64_t, float64_t, int64_t, f64)
> +
> +#define B(F, P)                                                      \
> +  PRED_##P (F, bool_t, bool_t, bool_t, b)
> +
> +#define ALL_SD_INTEGER(F, P)                                 \
> +  ALL_S_INTEGER (F, P)                                               \
> +  ALL_D_INTEGER (F, P)
> +
> +#define HSD_INTEGER_WIDE(F, P)                                       \
> +  ALL_H_INTEGER_WIDE (F, P)                                  \
> +  ALL_S_INTEGER_WIDE (F, P)                                  \
> +  ALL_D_INTEGER_WIDE (F, P)
> +
> +#define BHS_INTEGER_UINT64(F, P)                             \
> +  BHS_UNSIGNED_UINT64 (F, P)                                 \
> +  BHS_SIGNED_UINT64 (F, P)
> +
> +#define ALL_INTEGER(F, P)                                    \
> +  ALL_Q_INTEGER (F, P)                                               \
> +  ALL_H_INTEGER (F, P)                                               \
> +  ALL_S_INTEGER (F, P)                                               \
> +  ALL_D_INTEGER (F, P)
> +
> +#define ALL_INTEGER_UINT(F, P)                                       \
> +  ALL_Q_INTEGER_UINT (F, P)                                  \
> +  ALL_H_INTEGER_UINT (F, P)                                  \
> +  ALL_S_INTEGER_UINT (F, P)                                  \
> +  ALL_D_INTEGER_UINT (F, P)
> +
> +#define ALL_INTEGER_INT(F, P)                                        \
> +  ALL_Q_INTEGER_INT (F, P)                                   \
> +  ALL_H_INTEGER_INT (F, P)                                   \
> +  ALL_S_INTEGER_INT (F, P)                                   \
> +  ALL_D_INTEGER_INT (F, P)
> +
> +#define ALL_FLOAT_AND_SD_INTEGER(F, P)                               \
> +  ALL_SD_INTEGER (F, P)                                              \
> +  ALL_FLOAT (F, P)
> +
> +#define ALL_ARITH(F, P)                                              \
> +  ALL_INTEGER (F, P)                                         \
> +  ALL_FLOAT (F, P)
> +
> +#define ALL_DATA(F, P)                                               \
> +  ALL_ARITH (F, P)                                           \
> +  PRED_##P (F, bfloat16_t, bfloat16_t, bfloat16_t, bf16)
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c
> new file mode 100644
> index 00000000000..ef629413185
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +B (brkn, Zv)
> +B (brkpa, Zv)
> +B (brkpb, Zv)
> +ALL_DATA (splice, IMPLICIT)
> +
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, 
> z1\.d\n\tret\n} 12 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c
> new file mode 100644
> index 00000000000..91d574f9249
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_FLOAT_INT (scale, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 6 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c
> new file mode 100644
> index 00000000000..25c793ff40f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_ARITH (abd, MXZ)
> +ALL_ARITH (add, MXZ)
> +ALL_INTEGER (and, MXZ)
> +B (and, Zv)
> +ALL_INTEGER (bic, MXZ)
> +B (bic, Zv)
> +ALL_FLOAT_AND_SD_INTEGER (div, MXZ)
> +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ)
> +ALL_INTEGER (eor, MXZ)
> +B (eor, Zv)
> +ALL_ARITH (mul, MXZ)
> +ALL_INTEGER (mulh, MXZ)
> +ALL_FLOAT (mulx, MXZ)
> +B (nand, Zv)
> +B (nor, Zv)
> +B (orn, Zv)
> +ALL_INTEGER (orr, MXZ)
> +B (orr, Zv)
> +ALL_ARITH (sub, MXZ)
> +ALL_ARITH (subr, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 448 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 224 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c
> new file mode 100644
> index 00000000000..8d187c22eec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_ARITH (max, MXZ)
> +ALL_ARITH (min, MXZ)
> +ALL_FLOAT (maxnm, MXZ)
> +ALL_FLOAT (minnm, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 56 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c
> new file mode 100644
> index 00000000000..9940866d5ef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include <arm_sve.h>
> +
> +#define MXZ4(F, TYPE)                                                \
> +  TYPE F##_f (TYPE op1, TYPE op2)                            \
> +  {                                                          \
> +    return sv##F (svpfalse_b (), op1, op2, 90);      \
> +  }
> +
> +#define PRED_MXZ(F, TYPE, TY)                                        \
> +  MXZ4 (F##_##TY##_m, TYPE)                                  \
> +  MXZ4 (F##_##TY##_x, TYPE)                                  \
> +  MXZ4 (F##_##TY##_z, TYPE)
> +
> +#define ALL_FLOAT(F, P)                                              \
> +  PRED_##P (F, svfloat16_t, f16)                             \
> +  PRED_##P (F, svfloat32_t, f32)                             \
> +  PRED_##P (F, svfloat64_t, f64)
> +
> +ALL_FLOAT (cadd, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 3 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c
> new file mode 100644
> index 00000000000..f8fd18043e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +BHS_SIGNED_UINT64 (asr_wide, MXZ)
> +BHS_INTEGER_UINT64 (lsl_wide, MXZ)
> +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 48 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 24 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c
> new file mode 100644
> index 00000000000..2f1d7721bc8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_SIGNED_UINT (asr, MXZ)
> +ALL_INTEGER_UINT (lsl, MXZ)
> +ALL_UNSIGNED_UINT (lsr, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 32 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c
> new file mode 100644
> index 00000000000..723fcd0a203
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_ARITH (addp, MXv)
> +ALL_ARITH (maxp, MXv)
> +ALL_FLOAT (maxnmp, MXv)
> +ALL_ARITH (minp, MXv)
> +ALL_FLOAT (minnmp, MXv)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 78 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c
> new file mode 100644
> index 00000000000..6e8be86f9b0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_INTEGER_INT (rshl, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 16 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c
> new file mode 100644
> index 00000000000..7335a4ff011
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_INTEGER (hadd, MXZ)
> +ALL_INTEGER (hsub, MXZ)
> +ALL_INTEGER (hsubr, MXZ)
> +ALL_INTEGER (qadd, MXZ)
> +ALL_INTEGER (qsub, MXZ)
> +ALL_INTEGER (qsubr, MXZ)
> +ALL_INTEGER (rhadd, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 112 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c
> new file mode 100644
> index 00000000000..e03e1e890f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +SD_INTEGER_TO_UINT (histcnt, Zv)
> +
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 4 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c
> new file mode 100644
> index 00000000000..2649fc01954
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +ALL_SIGNED_UINT (uqadd, MXZ)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 8 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c
> new file mode 100644
> index 00000000000..72693d01ad0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "../pfalse-binary_0.c"
> +
> +HSD_INTEGER_WIDE (adalp, MXZv)
> +
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */
> +/* { dg-final { scan-assembler-times 
> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
> 6 } } */
> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */

Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

Reply via email to