Richard Sandiford <richard.sandif...@arm.com> writes: > Thanks for doing this and sorry for the slow review. > > Jennifer Schmitz <jschm...@nvidia.com> writes: >> If an SVE intrinsic has predicate pfalse, we can fold the call to >> a simplified assignment statement: For _m, _x, and implicit predication, >> the LHS can be assigned the operand for inactive values and for _z, we can >> assign a zero vector. > > _x functions don't really specify the values of inactive lanes. > The values are instead "don't care" and so we have a free choice. > Using zeros (as for _z) should be more efficient than reusing one > of the inputs. > >> For example, >> svint32_t foo (svint32_t op1, svint32_t op2) >> { >> return svadd_s32_m (svpfalse_b (), op1, op2); >> } >> can be folded to lhs <- op1, such that foo is compiled to just a RET. >> >> We implemented this optimization during gimple folding by calling a new >> method >> function_shape::fold_pfalse from gimple_folder::fold. >> The implementations of fold_pfalse in the function_shape subclasses define >> the >> expected behavior for a pfalse predicate for the different predications >> and return an appropriate gimple statement. >> To avoid code duplication, function_shape::fold_pfalse calls a new method >> gimple_folder:fold_by_pred that takes arguments of type tree for each >> predication and returns the new assignment statement depending on the >> predication. >> >> We tested the new behavior for each intrinsic with all supported predications >> and data types and checked the produced assembly. There is a test file >> for each shape subclass with scan-assembler-times tests that look for >> the simplified instruction sequences, such as individual RET instructions >> or zeroing moves. There is an additional directive counting the total number >> of >> functions in the test, which must be the sum of counts of all other >> directives. This is to check that all tested intrinsics were optimized. >> >> In this patch, we only implemented function_shape::fold_pfalse for >> binary shapes. But we plan to cover more shapes in follow-up patches, >> after getting feedback on this patch. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? > > My main question is: can we not do this generically for _z, _x and _m, > without the virtual method? (I wanted to prototype this locally before > asking, hence the delay, but never found time.) The only snag is that > we need to check whether the inactive lanes for _m are specified using > a separate parameter before the predicate or whether they're taken from > the parameter after the predicate. But that should be a generic rule > that can be checked by looking at the types. > > That is, if the predication is PRED_z, PRED_m, or PRED_x, then I think > we can say that: > > (1) if the first argument is a vector and the second argument is a pfalse > predicate, the call folds to the first argument > > (2) if the first argument is a pfalse predicate and predication is PRED_m, > the call folds to the second argument > > (3) if the first argument is a pfalse predicate and predication is > not PRED_m, the call folds to zero. > > But perhaps I've forgotten a case. > > The implicit cases would still need to be handled separately. > Perhaps for that case we could add a general fold method to the shape? > I don't think we need to specialise the name of the virtual function > beyond just "fold".
Actually, thinking more about it: I think we can handle some implicit cases directly too. If the call properties include CP_READ_MEMORY then the result should be zero. If the call properties include CP_WRITE_MEMORY or CP_PREFETCH_MEMORY then the call can be replaced by a nop. For the others, it would probably be better to stick to function-specific fold routines, rather than do it based on the shape. We could use common base classes for things like reductions. Thanks, Richard > > It would be good to put: > >> + gimple_seq stmts = NULL; >> + gimple *g = gimple_build_assign (lhs, new_lhs); >> + gimple_seq_add_stmt_without_update (&stmts, g); >> + gsi_replace_with_seq_vops (gsi, stmts); >> + return g; > > into a helper so that we can use it for any g. (Maybe with yet another > helper for folding to a gimple value, as here.) Some of the existing > routines could use that helper too, so it would make a natural prepatch. > > I think I've made that sound more complicated than it really is, sorry. > But I think the net effect should be less code, with automatic support > for unary operations. > > Thanks, > Richard > >> @@ -3666,6 +3683,39 @@ gimple_folder::fold_active_lanes_to (tree x) >> return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); >> } >> >> +/* Fold call to assignment statement >> + lhs = new_lhs, >> + where new_lhs is determined by the predication. >> + Return the gimple statement on success, else return NULL. */ >> +gimple * >> +gimple_folder::fold_by_pred (tree m, tree x, tree z, tree implicit) >> +{ >> + tree new_lhs = NULL; >> + switch (pred) >> + { >> + case PRED_z: >> + new_lhs = z; >> + break; >> + case PRED_m: >> + new_lhs = m; >> + break; >> + case PRED_x: >> + new_lhs = x; >> + break; >> + case PRED_implicit: >> + new_lhs = implicit; >> + break; >> + default: >> + return NULL; >> + } >> + gcc_assert (new_lhs); >> + gimple_seq stmts = NULL; >> + gimple *g = gimple_build_assign (lhs, new_lhs); >> + gimple_seq_add_stmt_without_update (&stmts, g); >> + gsi_replace_with_seq_vops (gsi, stmts); >> + return g; >> +} >> + >> /* Try to fold the call. Return the new statement on success and null >> on failure. */ >> gimple * >> @@ -3685,6 +3735,9 @@ gimple_folder::fold () >> /* First try some simplifications that are common to many functions. */ >> if (auto *call = redirect_pred_x ()) >> return call; >> + if (pred != PRED_none) >> + if (auto *call = shape->fold_pfalse (*this)) >> + return call; >> >> return base->fold (*this); >> } >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h >> b/gcc/config/aarch64/aarch64-sve-builtins.h >> index 4cdc0541bdc..4e443a8192e 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins.h >> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h >> @@ -632,12 +632,15 @@ public: >> gcall *redirect_call (const function_instance &); >> gimple *redirect_pred_x (); >> >> + bool arg_is_pfalse_p (unsigned int idx); >> + >> gimple *fold_to_cstu (poly_uint64); >> gimple *fold_to_pfalse (); >> gimple *fold_to_ptrue (); >> gimple *fold_to_vl_pred (unsigned int); >> gimple *fold_const_binary (enum tree_code); >> gimple *fold_active_lanes_to (tree); >> + gimple *fold_by_pred (tree, tree, tree, tree); >> >> gimple *fold (); >> >> @@ -796,6 +799,11 @@ public: >> /* Check whether the given call is semantically valid. Return true >> if it is, otherwise report an error and return false. */ >> virtual bool check (function_checker &) const { return true; } >> + >> + /* For a pfalse predicate, try to fold the given gimple call. >> + Return the new gimple statement on success, otherwise return null. */ >> + virtual gimple *fold_pfalse (gimple_folder &) const { return NULL; } >> + >> }; >> >> /* RAII class for enabling enough SVE features to define the built-in >> @@ -829,6 +837,7 @@ extern tree acle_svprfop; >> >> bool vector_cst_all_same (tree, unsigned int); >> bool is_ptrue (tree, unsigned int); >> +bool is_pfalse (tree); >> const function_instance *lookup_fndecl (tree); >> >> /* Try to find a mode with the given mode_suffix_info fields. Return the >> diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> new file mode 100644 >> index 00000000000..3910ab36b6b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> @@ -0,0 +1,176 @@ >> +#include <arm_sve.h> >> + >> +#define MXZ(F, RTY, TY1, TY2) \ >> + RTY F##_f (TY1 op1, TY2 op2) \ >> + { \ >> + return sv##F (svpfalse_b (), op1, op2); \ >> + } >> + >> +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_##TY##_m, sv##RTY, sv##TYPE1, sv##TYPE2) \ >> + MXZ (F##_##TY##_x, sv##RTY, sv##TYPE1, sv##TYPE2) >> + >> +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_##TY##_z, sv##RTY, sv##TYPE1, sv##TYPE2) >> + >> +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) >> + >> +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_n_##TY##_z, sv##RTY, sv##TYPE1, TYPE2) >> + >> +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_n_##TY##_m, sv##RTY, sv##TYPE1, TYPE2) \ >> + MXZ (F##_n_##TY##_x, sv##RTY, sv##TYPE1, TYPE2) \ >> + PRED_Z (F, RTY, TYPE1, TYPE2, TY) >> + >> +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_##TY, sv##RTY, sv##TYPE1, sv##TYPE2) >> + >> +#define ALL_Q_INTEGER(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ >> + PRED_##P (F, int8_t, int8_t, int8_t, s8) >> + >> +#define ALL_Q_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ >> + PRED_##P (F, int8_t, int8_t, uint8_t, s8) >> + >> +#define ALL_Q_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, int8_t, u8) \ >> + PRED_##P (F, int8_t, int8_t, int8_t, s8) >> + >> +#define ALL_H_INTEGER(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, int16_t, s16) >> + >> +#define ALL_H_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, uint16_t, s16) >> + >> +#define ALL_H_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, int16_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, int16_t, s16) >> + >> +#define ALL_H_INTEGER_WIDE(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, uint8_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, int8_t, s16) >> + >> +#define ALL_S_INTEGER(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, int32_t, s32) >> + >> +#define ALL_S_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, uint32_t, s32) >> + >> +#define ALL_S_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, int32_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, int32_t, s32) >> + >> +#define ALL_S_INTEGER_WIDE(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint16_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, int16_t, s32) >> + >> +#define ALL_D_INTEGER(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, int64_t, s64) >> + >> +#define ALL_D_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, uint64_t, s64) >> + >> +#define ALL_D_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, int64_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, int64_t, s64) >> + >> +#define ALL_D_INTEGER_WIDE(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, uint32_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, int32_t, s64) >> + >> +#define SD_INTEGER_TO_UINT(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ >> + PRED_##P (F, uint32_t, int32_t, int32_t, s32) \ >> + PRED_##P (F, uint64_t, int64_t, int64_t, s64) >> + >> +#define BHS_UNSIGNED_UINT64(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint64_t, u8) \ >> + PRED_##P (F, uint16_t, uint16_t, uint64_t, u16) \ >> + PRED_##P (F, uint32_t, uint32_t, uint64_t, u32) >> + >> +#define BHS_SIGNED_UINT64(F, P) \ >> + PRED_##P (F, int8_t, int8_t, uint64_t, s8) \ >> + PRED_##P (F, int16_t, int16_t, uint64_t, s16) \ >> + PRED_##P (F, int32_t, int32_t, uint64_t, s32) >> + >> +#define ALL_UNSIGNED_UINT(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ >> + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) >> + >> +#define ALL_SIGNED_UINT(F, P) \ >> + PRED_##P (F, int8_t, int8_t, uint8_t, s8) \ >> + PRED_##P (F, int16_t, int16_t, uint16_t, s16) \ >> + PRED_##P (F, int32_t, int32_t, uint32_t, s32) \ >> + PRED_##P (F, int64_t, int64_t, uint64_t, s64) >> + >> +#define ALL_FLOAT(F, P) \ >> + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ >> + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ >> + PRED_##P (F, float64_t, float64_t, float64_t, f64) >> + >> +#define ALL_FLOAT_INT(F, P) \ >> + PRED_##P (F, float16_t, float16_t, int16_t, f16) \ >> + PRED_##P (F, float32_t, float32_t, int32_t, f32) \ >> + PRED_##P (F, float64_t, float64_t, int64_t, f64) >> + >> +#define B(F, P) \ >> + PRED_##P (F, bool_t, bool_t, bool_t, b) >> + >> +#define ALL_SD_INTEGER(F, P) \ >> + ALL_S_INTEGER (F, P) \ >> + ALL_D_INTEGER (F, P) >> + >> +#define HSD_INTEGER_WIDE(F, P) \ >> + ALL_H_INTEGER_WIDE (F, P) \ >> + ALL_S_INTEGER_WIDE (F, P) \ >> + ALL_D_INTEGER_WIDE (F, P) >> + >> +#define BHS_INTEGER_UINT64(F, P) \ >> + BHS_UNSIGNED_UINT64 (F, P) \ >> + BHS_SIGNED_UINT64 (F, P) >> + >> +#define ALL_INTEGER(F, P) \ >> + ALL_Q_INTEGER (F, P) \ >> + ALL_H_INTEGER (F, P) \ >> + ALL_S_INTEGER (F, P) \ >> + ALL_D_INTEGER (F, P) >> + >> +#define ALL_INTEGER_UINT(F, P) \ >> + ALL_Q_INTEGER_UINT (F, P) \ >> + ALL_H_INTEGER_UINT (F, P) \ >> + ALL_S_INTEGER_UINT (F, P) \ >> + ALL_D_INTEGER_UINT (F, P) >> + >> +#define ALL_INTEGER_INT(F, P) \ >> + ALL_Q_INTEGER_INT (F, P) \ >> + ALL_H_INTEGER_INT (F, P) \ >> + ALL_S_INTEGER_INT (F, P) \ >> + ALL_D_INTEGER_INT (F, P) >> + >> +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ >> + ALL_SD_INTEGER (F, P) \ >> + ALL_FLOAT (F, P) >> + >> +#define ALL_ARITH(F, P) \ >> + ALL_INTEGER (F, P) \ >> + ALL_FLOAT (F, P) >> + >> +#define ALL_DATA(F, P) \ >> + ALL_ARITH (F, P) \ >> + PRED_##P (F, bfloat16_t, bfloat16_t, bfloat16_t, bf16) >> + >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> new file mode 100644 >> index 00000000000..ef629413185 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +B (brkn, Zv) >> +B (brkpa, Zv) >> +B (brkpb, Zv) >> +ALL_DATA (splice, IMPLICIT) >> + >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, >> z1\.d\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c >> new file mode 100644 >> index 00000000000..91d574f9249 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_FLOAT_INT (scale, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 6 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c >> new file mode 100644 >> index 00000000000..25c793ff40f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c >> @@ -0,0 +1,30 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_ARITH (abd, MXZ) >> +ALL_ARITH (add, MXZ) >> +ALL_INTEGER (and, MXZ) >> +B (and, Zv) >> +ALL_INTEGER (bic, MXZ) >> +B (bic, Zv) >> +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) >> +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) >> +ALL_INTEGER (eor, MXZ) >> +B (eor, Zv) >> +ALL_ARITH (mul, MXZ) >> +ALL_INTEGER (mulh, MXZ) >> +ALL_FLOAT (mulx, MXZ) >> +B (nand, Zv) >> +B (nor, Zv) >> +B (orn, Zv) >> +ALL_INTEGER (orr, MXZ) >> +B (orr, Zv) >> +ALL_ARITH (sub, MXZ) >> +ALL_ARITH (subr, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 448 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 224 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ >> diff --git >> a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c >> new file mode 100644 >> index 00000000000..8d187c22eec >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_ARITH (max, MXZ) >> +ALL_ARITH (min, MXZ) >> +ALL_FLOAT (maxnm, MXZ) >> +ALL_FLOAT (minnm, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 56 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c >> new file mode 100644 >> index 00000000000..9940866d5ef >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c >> @@ -0,0 +1,26 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include <arm_sve.h> >> + >> +#define MXZ4(F, TYPE) \ >> + TYPE F##_f (TYPE op1, TYPE op2) \ >> + { \ >> + return sv##F (svpfalse_b (), op1, op2, 90); \ >> + } >> + >> +#define PRED_MXZ(F, TYPE, TY) \ >> + MXZ4 (F##_##TY##_m, TYPE) \ >> + MXZ4 (F##_##TY##_x, TYPE) \ >> + MXZ4 (F##_##TY##_z, TYPE) >> + >> +#define ALL_FLOAT(F, P) \ >> + PRED_##P (F, svfloat16_t, f16) \ >> + PRED_##P (F, svfloat32_t, f32) \ >> + PRED_##P (F, svfloat64_t, f64) >> + >> +ALL_FLOAT (cadd, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 3 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ >> diff --git >> a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c >> new file mode 100644 >> index 00000000000..f8fd18043e6 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +BHS_SIGNED_UINT64 (asr_wide, MXZ) >> +BHS_INTEGER_UINT64 (lsl_wide, MXZ) >> +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 48 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 24 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c >> new file mode 100644 >> index 00000000000..2f1d7721bc8 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_SIGNED_UINT (asr, MXZ) >> +ALL_INTEGER_UINT (lsl, MXZ) >> +ALL_UNSIGNED_UINT (lsr, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 32 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c >> new file mode 100644 >> index 00000000000..723fcd0a203 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_ARITH (addp, MXv) >> +ALL_ARITH (maxp, MXv) >> +ALL_FLOAT (maxnmp, MXv) >> +ALL_ARITH (minp, MXv) >> +ALL_FLOAT (minnmp, MXv) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 78 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ >> diff --git >> a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c >> new file mode 100644 >> index 00000000000..6e8be86f9b0 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_INTEGER_INT (rshl, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 16 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c >> new file mode 100644 >> index 00000000000..7335a4ff011 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_INTEGER (hadd, MXZ) >> +ALL_INTEGER (hsub, MXZ) >> +ALL_INTEGER (hsubr, MXZ) >> +ALL_INTEGER (qadd, MXZ) >> +ALL_INTEGER (qsub, MXZ) >> +ALL_INTEGER (qsubr, MXZ) >> +ALL_INTEGER (rhadd, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 112 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c >> new file mode 100644 >> index 00000000000..e03e1e890f8 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c >> @@ -0,0 +1,9 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +SD_INTEGER_TO_UINT (histcnt, Zv) >> + >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 4 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ >> diff --git >> a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c >> new file mode 100644 >> index 00000000000..2649fc01954 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_SIGNED_UINT (uqadd, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 8 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c >> new file mode 100644 >> index 00000000000..72693d01ad0 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +HSD_INTEGER_WIDE (adalp, MXZv) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times >> {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} >> 6 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */