Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

Richard Sandiford Wed, 06 Nov 2024 23:49:33 -0800

Richard Sandiford <richard.sandif...@arm.com> writes:
> Thanks for doing this and sorry for the slow review.
>
> Jennifer Schmitz <jschm...@nvidia.com> writes:
>> If an SVE intrinsic has predicate pfalse, we can fold the call to
>> a simplified assignment statement: For _m, _x, and implicit predication,
>> the LHS can be assigned the operand for inactive values and for _z, we can
>> assign a zero vector.
>
> _x functions don't really specify the values of inactive lanes.
> The values are instead "don't care" and so we have a free choice.
> Using zeros (as for _z) should be more efficient than reusing one
> of the inputs.
>
>> For example,
>> svint32_t foo (svint32_t op1, svint32_t op2)
>> {
>>   return svadd_s32_m (svpfalse_b (), op1, op2);
>> }
>> can be folded to lhs <- op1, such that foo is compiled to just a RET.
>>
>> We implemented this optimization during gimple folding by calling a new 
>> method
>> function_shape::fold_pfalse from gimple_folder::fold.
>> The implementations of fold_pfalse in the function_shape subclasses define 
>> the
>> expected behavior for a pfalse predicate for the different predications
>> and return an appropriate gimple statement.
>> To avoid code duplication, function_shape::fold_pfalse calls a new method
>> gimple_folder:fold_by_pred that takes arguments of type tree for each
>> predication and returns the new assignment statement depending on the
>> predication.
>>
>> We tested the new behavior for each intrinsic with all supported predications
>> and data types and checked the produced assembly. There is a test file
>> for each shape subclass with scan-assembler-times tests that look for
>> the simplified instruction sequences, such as individual RET instructions
>> or zeroing moves. There is an additional directive counting the total number 
>> of
>> functions in the test, which must be the sum of counts of all other
>> directives. This is to check that all tested intrinsics were optimized.
>>
>> In this patch, we only implemented function_shape::fold_pfalse for
>> binary shapes. But we plan to cover more shapes in follow-up patches,
>> after getting feedback on this patch.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>
> My main question is: can we not do this generically for _z, _x and _m,
> without the virtual method?  (I wanted to prototype this locally before
> asking, hence the delay, but never found time.)  The only snag is that
> we need to check whether the inactive lanes for _m are specified using
> a separate parameter before the predicate or whether they're taken from
> the parameter after the predicate.  But that should be a generic rule
> that can be checked by looking at the types.
>
> That is, if the predication is PRED_z, PRED_m, or PRED_x, then I think
> we can say that:
>
> (1) if the first argument is a vector and the second argument is a pfalse
>     predicate, the call folds to the first argument
>
> (2) if the first argument is a pfalse predicate and predication is PRED_m,
>     the call folds to the second argument
>
> (3) if the first argument is a pfalse predicate and predication is
>     not PRED_m, the call folds to zero.
>
> But perhaps I've forgotten a case.
>
> The implicit cases would still need to be handled separately.
> Perhaps for that case we could add a general fold method to the shape?
> I don't think we need to specialise the name of the virtual function
> beyond just "fold".


Actually, thinking more about it: I think we can handle some implicit cases
directly too.  If the call properties include CP_READ_MEMORY then the result
should be zero.  If the call properties include CP_WRITE_MEMORY or
CP_PREFETCH_MEMORY then the call can be replaced by a nop.

For the others, it would probably be better to stick to function-specific
fold routines, rather than do it based on the shape.  We could use common
base classes for things like reductions.

Thanks,
Richard

>
> It would be good to put:
>
>> +  gimple_seq stmts = NULL;
>> +  gimple *g = gimple_build_assign (lhs, new_lhs);
>> +  gimple_seq_add_stmt_without_update (&stmts, g);
>> +  gsi_replace_with_seq_vops (gsi, stmts);
>> +  return g;
>
> into a helper so that we can use it for any g.  (Maybe with yet another
> helper for folding to a gimple value, as here.)  Some of the existing
> routines could use that helper too, so it would make a natural prepatch.
>
> I think I've made that sound more complicated than it really is, sorry.
> But I think the net effect should be less code, with automatic support
> for unary operations.
>
> Thanks,
> Richard
>
>> @@ -3666,6 +3683,39 @@ gimple_folder::fold_active_lanes_to (tree x)
>>    return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive);
>>  }
>>  
>> +/* Fold call to assignment statement
>> +   lhs = new_lhs,
>> +   where new_lhs is determined by the predication.
>> +   Return the gimple statement on success, else return NULL.  */
>> +gimple *
>> +gimple_folder::fold_by_pred (tree m, tree x, tree z, tree implicit)
>> +{
>> +  tree new_lhs = NULL;
>> +  switch (pred)
>> +    {
>> +    case PRED_z:
>> +      new_lhs = z;
>> +      break;
>> +    case PRED_m:
>> +      new_lhs = m;
>> +      break;
>> +    case PRED_x:
>> +      new_lhs = x;
>> +      break;
>> +    case PRED_implicit:
>> +      new_lhs = implicit;
>> +      break;
>> +    default:
>> +      return NULL;
>> +    }
>> +  gcc_assert (new_lhs);
>> +  gimple_seq stmts = NULL;
>> +  gimple *g = gimple_build_assign (lhs, new_lhs);
>> +  gimple_seq_add_stmt_without_update (&stmts, g);
>> +  gsi_replace_with_seq_vops (gsi, stmts);
>> +  return g;
>> +}
>> +
>>  /* Try to fold the call.  Return the new statement on success and null
>>     on failure.  */
>>  gimple *
>> @@ -3685,6 +3735,9 @@ gimple_folder::fold ()
>>    /* First try some simplifications that are common to many functions.  */
>>    if (auto *call = redirect_pred_x ())
>>      return call;
>> +  if (pred != PRED_none)
>> +    if (auto *call = shape->fold_pfalse (*this))
>> +      return call;
>>  
>>    return base->fold (*this);
>>  }
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
>> b/gcc/config/aarch64/aarch64-sve-builtins.h
>> index 4cdc0541bdc..4e443a8192e 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>> @@ -632,12 +632,15 @@ public:
>>    gcall *redirect_call (const function_instance &);
>>    gimple *redirect_pred_x ();
>>  
>> +  bool arg_is_pfalse_p (unsigned int idx);
>> +
>>    gimple *fold_to_cstu (poly_uint64);
>>    gimple *fold_to_pfalse ();
>>    gimple *fold_to_ptrue ();
>>    gimple *fold_to_vl_pred (unsigned int);
>>    gimple *fold_const_binary (enum tree_code);
>>    gimple *fold_active_lanes_to (tree);
>> +  gimple *fold_by_pred (tree, tree, tree, tree);
>>  
>>    gimple *fold ();
>>  
>> @@ -796,6 +799,11 @@ public:
>>    /* Check whether the given call is semantically valid.  Return true
>>       if it is, otherwise report an error and return false.  */
>>    virtual bool check (function_checker &) const { return true; }
>> +
>> +  /* For a pfalse predicate, try to fold the given gimple call.
>> +     Return the new gimple statement on success, otherwise return null.  */
>> +  virtual gimple *fold_pfalse (gimple_folder &) const { return NULL; }
>> +
>>  };
>>  
>>  /* RAII class for enabling enough SVE features to define the built-in
>> @@ -829,6 +837,7 @@ extern tree acle_svprfop;
>>  
>>  bool vector_cst_all_same (tree, unsigned int);
>>  bool is_ptrue (tree, unsigned int);
>> +bool is_pfalse (tree);
>>  const function_instance *lookup_fndecl (tree);
>>  
>>  /* Try to find a mode with the given mode_suffix_info fields.  Return the
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c 
>> b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c
>> new file mode 100644
>> index 00000000000..3910ab36b6b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c
>> @@ -0,0 +1,176 @@
>> +#include <arm_sve.h>
>> +
>> +#define MXZ(F, RTY, TY1, TY2)                                       \
>> +  RTY F##_f (TY1 op1, TY2 op2)                                      \
>> +  {                                                         \
>> +    return sv##F (svpfalse_b (), op1, op2);                 \
>> +  }
>> +
>> +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY)                  \
>> +  MXZ (F##_##TY##_m, sv##RTY, sv##TYPE1, sv##TYPE2)         \
>> +  MXZ (F##_##TY##_x, sv##RTY, sv##TYPE1, sv##TYPE2)
>> +
>> +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY)                   \
>> +  MXZ (F##_##TY##_z, sv##RTY, sv##TYPE1, sv##TYPE2)
>> +
>> +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY)                 \
>> +  PRED_MXv (F, RTY, TYPE1, TYPE2, TY)                               \
>> +  PRED_Zv (F, RTY, TYPE1, TYPE2, TY)
>> +
>> +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY)                    \
>> +  PRED_Zv (F, RTY, TYPE1, TYPE2, TY)                                \
>> +  MXZ (F##_n_##TY##_z, sv##RTY, sv##TYPE1, TYPE2)
>> +
>> +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY)                  \
>> +  PRED_MXv (F, RTY, TYPE1, TYPE2, TY)                               \
>> +  MXZ (F##_n_##TY##_m, sv##RTY, sv##TYPE1, TYPE2)           \
>> +  MXZ (F##_n_##TY##_x, sv##RTY, sv##TYPE1, TYPE2)           \
>> +  PRED_Z (F, RTY, TYPE1, TYPE2, TY)
>> +
>> +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY)                     \
>> +  MXZ (F##_##TY, sv##RTY, sv##TYPE1, sv##TYPE2)
>> +
>> +#define ALL_Q_INTEGER(F, P)                                 \
>> +  PRED_##P (F, uint8_t, uint8_t, uint8_t, u8)                       \
>> +  PRED_##P (F, int8_t, int8_t, int8_t, s8)
>> +
>> +#define ALL_Q_INTEGER_UINT(F, P)                            \
>> +  PRED_##P (F, uint8_t, uint8_t, uint8_t, u8)                       \
>> +  PRED_##P (F, int8_t, int8_t, uint8_t, s8)
>> +
>> +#define ALL_Q_INTEGER_INT(F, P)                                     \
>> +  PRED_##P (F, uint8_t, uint8_t, int8_t, u8)                        \
>> +  PRED_##P (F, int8_t, int8_t, int8_t, s8)
>> +
>> +#define ALL_H_INTEGER(F, P)                                 \
>> +  PRED_##P (F, uint16_t, uint16_t, uint16_t, u16)           \
>> +  PRED_##P (F, int16_t, int16_t, int16_t, s16)
>> +
>> +#define ALL_H_INTEGER_UINT(F, P)                            \
>> +  PRED_##P (F, uint16_t, uint16_t, uint16_t, u16)           \
>> +  PRED_##P (F, int16_t, int16_t, uint16_t, s16)
>> +
>> +#define ALL_H_INTEGER_INT(F, P)                                     \
>> +  PRED_##P (F, uint16_t, uint16_t, int16_t, u16)            \
>> +  PRED_##P (F, int16_t, int16_t, int16_t, s16)
>> +
>> +#define ALL_H_INTEGER_WIDE(F, P)                            \
>> +  PRED_##P (F, uint16_t, uint16_t, uint8_t, u16)            \
>> +  PRED_##P (F, int16_t, int16_t, int8_t, s16)
>> +
>> +#define ALL_S_INTEGER(F, P)                                 \
>> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)           \
>> +  PRED_##P (F, int32_t, int32_t, int32_t, s32)
>> +
>> +#define ALL_S_INTEGER_UINT(F, P)                            \
>> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)           \
>> +  PRED_##P (F, int32_t, int32_t, uint32_t, s32)
>> +
>> +#define ALL_S_INTEGER_INT(F, P)                                     \
>> +  PRED_##P (F, uint32_t, uint32_t, int32_t, u32)            \
>> +  PRED_##P (F, int32_t, int32_t, int32_t, s32)
>> +
>> +#define ALL_S_INTEGER_WIDE(F, P)                            \
>> +  PRED_##P (F, uint32_t, uint32_t, uint16_t, u32)           \
>> +  PRED_##P (F, int32_t, int32_t, int16_t, s32)
>> +
>> +#define ALL_D_INTEGER(F, P)                                 \
>> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)           \
>> +  PRED_##P (F, int64_t, int64_t, int64_t, s64)
>> +
>> +#define ALL_D_INTEGER_UINT(F, P)                            \
>> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)           \
>> +  PRED_##P (F, int64_t, int64_t, uint64_t, s64)
>> +
>> +#define ALL_D_INTEGER_INT(F, P)                                     \
>> +  PRED_##P (F, uint64_t, uint64_t, int64_t, u64)            \
>> +  PRED_##P (F, int64_t, int64_t, int64_t, s64)
>> +
>> +#define ALL_D_INTEGER_WIDE(F, P)                            \
>> +  PRED_##P (F, uint64_t, uint64_t, uint32_t, u64)           \
>> +  PRED_##P (F, int64_t, int64_t, int32_t, s64)
>> +
>> +#define SD_INTEGER_TO_UINT(F, P)                            \
>> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)           \
>> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)           \
>> +  PRED_##P (F, uint32_t, int32_t, int32_t, s32)                     \
>> +  PRED_##P (F, uint64_t, int64_t, int64_t, s64)
>> +
>> +#define BHS_UNSIGNED_UINT64(F, P)                           \
>> +  PRED_##P (F, uint8_t, uint8_t, uint64_t, u8)                      \
>> +  PRED_##P (F, uint16_t, uint16_t, uint64_t, u16)           \
>> +  PRED_##P (F, uint32_t, uint32_t, uint64_t, u32)
>> +
>> +#define BHS_SIGNED_UINT64(F, P)                                     \
>> +  PRED_##P (F, int8_t, int8_t, uint64_t, s8)                        \
>> +  PRED_##P (F, int16_t, int16_t, uint64_t, s16)                     \
>> +  PRED_##P (F, int32_t, int32_t, uint64_t, s32)
>> +
>> +#define ALL_UNSIGNED_UINT(F, P)                                     \
>> +  PRED_##P (F, uint8_t, uint8_t, uint8_t, u8)                       \
>> +  PRED_##P (F, uint16_t, uint16_t, uint16_t, u16)           \
>> +  PRED_##P (F, uint32_t, uint32_t, uint32_t, u32)           \
>> +  PRED_##P (F, uint64_t, uint64_t, uint64_t, u64)
>> +
>> +#define ALL_SIGNED_UINT(F, P)                                       \
>> +  PRED_##P (F, int8_t, int8_t, uint8_t, s8)                 \
>> +  PRED_##P (F, int16_t, int16_t, uint16_t, s16)                     \
>> +  PRED_##P (F, int32_t, int32_t, uint32_t, s32)                     \
>> +  PRED_##P (F, int64_t, int64_t, uint64_t, s64)
>> +
>> +#define ALL_FLOAT(F, P)                                             \
>> +  PRED_##P (F, float16_t, float16_t, float16_t, f16)                \
>> +  PRED_##P (F, float32_t, float32_t, float32_t, f32)                \
>> +  PRED_##P (F, float64_t, float64_t, float64_t, f64)
>> +
>> +#define ALL_FLOAT_INT(F, P)                                 \
>> +  PRED_##P (F, float16_t, float16_t, int16_t, f16)          \
>> +  PRED_##P (F, float32_t, float32_t, int32_t, f32)          \
>> +  PRED_##P (F, float64_t, float64_t, int64_t, f64)
>> +
>> +#define B(F, P)                                                     \
>> +  PRED_##P (F, bool_t, bool_t, bool_t, b)
>> +
>> +#define ALL_SD_INTEGER(F, P)                                        \
>> +  ALL_S_INTEGER (F, P)                                              \
>> +  ALL_D_INTEGER (F, P)
>> +
>> +#define HSD_INTEGER_WIDE(F, P)                                      \
>> +  ALL_H_INTEGER_WIDE (F, P)                                 \
>> +  ALL_S_INTEGER_WIDE (F, P)                                 \
>> +  ALL_D_INTEGER_WIDE (F, P)
>> +
>> +#define BHS_INTEGER_UINT64(F, P)                            \
>> +  BHS_UNSIGNED_UINT64 (F, P)                                        \
>> +  BHS_SIGNED_UINT64 (F, P)
>> +
>> +#define ALL_INTEGER(F, P)                                   \
>> +  ALL_Q_INTEGER (F, P)                                              \
>> +  ALL_H_INTEGER (F, P)                                              \
>> +  ALL_S_INTEGER (F, P)                                              \
>> +  ALL_D_INTEGER (F, P)
>> +
>> +#define ALL_INTEGER_UINT(F, P)                                      \
>> +  ALL_Q_INTEGER_UINT (F, P)                                 \
>> +  ALL_H_INTEGER_UINT (F, P)                                 \
>> +  ALL_S_INTEGER_UINT (F, P)                                 \
>> +  ALL_D_INTEGER_UINT (F, P)
>> +
>> +#define ALL_INTEGER_INT(F, P)                                       \
>> +  ALL_Q_INTEGER_INT (F, P)                                  \
>> +  ALL_H_INTEGER_INT (F, P)                                  \
>> +  ALL_S_INTEGER_INT (F, P)                                  \
>> +  ALL_D_INTEGER_INT (F, P)
>> +
>> +#define ALL_FLOAT_AND_SD_INTEGER(F, P)                              \
>> +  ALL_SD_INTEGER (F, P)                                             \
>> +  ALL_FLOAT (F, P)
>> +
>> +#define ALL_ARITH(F, P)                                             \
>> +  ALL_INTEGER (F, P)                                                \
>> +  ALL_FLOAT (F, P)
>> +
>> +#define ALL_DATA(F, P)                                              \
>> +  ALL_ARITH (F, P)                                          \
>> +  PRED_##P (F, bfloat16_t, bfloat16_t, bfloat16_t, bf16)
>> +
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c
>> new file mode 100644
>> index 00000000000..ef629413185
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +B (brkn, Zv)
>> +B (brkpa, Zv)
>> +B (brkpb, Zv)
>> +ALL_DATA (splice, IMPLICIT)
>> +
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, 
>> z1\.d\n\tret\n} 12 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c
>> new file mode 100644
>> index 00000000000..91d574f9249
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_FLOAT_INT (scale, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 6 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c
>> new file mode 100644
>> index 00000000000..25c793ff40f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c
>> @@ -0,0 +1,30 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_ARITH (abd, MXZ)
>> +ALL_ARITH (add, MXZ)
>> +ALL_INTEGER (and, MXZ)
>> +B (and, Zv)
>> +ALL_INTEGER (bic, MXZ)
>> +B (bic, Zv)
>> +ALL_FLOAT_AND_SD_INTEGER (div, MXZ)
>> +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ)
>> +ALL_INTEGER (eor, MXZ)
>> +B (eor, Zv)
>> +ALL_ARITH (mul, MXZ)
>> +ALL_INTEGER (mulh, MXZ)
>> +ALL_FLOAT (mulx, MXZ)
>> +B (nand, Zv)
>> +B (nor, Zv)
>> +B (orn, Zv)
>> +ALL_INTEGER (orr, MXZ)
>> +B (orr, Zv)
>> +ALL_ARITH (sub, MXZ)
>> +ALL_ARITH (subr, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 448 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 224 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c
>> new file mode 100644
>> index 00000000000..8d187c22eec
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_ARITH (max, MXZ)
>> +ALL_ARITH (min, MXZ)
>> +ALL_FLOAT (maxnm, MXZ)
>> +ALL_FLOAT (minnm, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 56 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c
>> new file mode 100644
>> index 00000000000..9940866d5ef
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include <arm_sve.h>
>> +
>> +#define MXZ4(F, TYPE)                                               \
>> +  TYPE F##_f (TYPE op1, TYPE op2)                           \
>> +  {                                                         \
>> +    return sv##F (svpfalse_b (), op1, op2, 90);     \
>> +  }
>> +
>> +#define PRED_MXZ(F, TYPE, TY)                                       \
>> +  MXZ4 (F##_##TY##_m, TYPE)                                 \
>> +  MXZ4 (F##_##TY##_x, TYPE)                                 \
>> +  MXZ4 (F##_##TY##_z, TYPE)
>> +
>> +#define ALL_FLOAT(F, P)                                             \
>> +  PRED_##P (F, svfloat16_t, f16)                            \
>> +  PRED_##P (F, svfloat32_t, f32)                            \
>> +  PRED_##P (F, svfloat64_t, f64)
>> +
>> +ALL_FLOAT (cadd, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 3 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c
>> new file mode 100644
>> index 00000000000..f8fd18043e6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +BHS_SIGNED_UINT64 (asr_wide, MXZ)
>> +BHS_INTEGER_UINT64 (lsl_wide, MXZ)
>> +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 48 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 24 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c
>> new file mode 100644
>> index 00000000000..2f1d7721bc8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_SIGNED_UINT (asr, MXZ)
>> +ALL_INTEGER_UINT (lsl, MXZ)
>> +ALL_UNSIGNED_UINT (lsr, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 32 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c
>> new file mode 100644
>> index 00000000000..723fcd0a203
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_ARITH (addp, MXv)
>> +ALL_ARITH (maxp, MXv)
>> +ALL_FLOAT (maxnmp, MXv)
>> +ALL_ARITH (minp, MXv)
>> +ALL_FLOAT (minnmp, MXv)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 78 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c
>> new file mode 100644
>> index 00000000000..6e8be86f9b0
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_INTEGER_INT (rshl, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 16 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c
>> new file mode 100644
>> index 00000000000..7335a4ff011
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_INTEGER (hadd, MXZ)
>> +ALL_INTEGER (hsub, MXZ)
>> +ALL_INTEGER (hsubr, MXZ)
>> +ALL_INTEGER (qadd, MXZ)
>> +ALL_INTEGER (qsub, MXZ)
>> +ALL_INTEGER (qsubr, MXZ)
>> +ALL_INTEGER (rhadd, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 112 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c
>> new file mode 100644
>> index 00000000000..e03e1e890f8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +SD_INTEGER_TO_UINT (histcnt, Zv)
>> +
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 4 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c
>> new file mode 100644
>> index 00000000000..2649fc01954
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +ALL_SIGNED_UINT (uqadd, MXZ)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 8 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c
>> new file mode 100644
>> index 00000000000..72693d01ad0
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "../pfalse-binary_0.c"
>> +
>> +HSD_INTEGER_WIDE (adalp, MXZv)
>> +
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */
>> +/* { dg-final { scan-assembler-times 
>> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 
>> 6 } } */
>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */

Re: [PATCH] [PR106329] SVE intrinsics: Fold calls with pfalse predicate.

Reply via email to