Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > With patch, the only following FAIL remains for aarch64-sve.exp: > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > scan-assembler-times \\tmovprfx\\t 6 > which now contains 14. > Should I adjust the test, assuming the change isn't a regression ?
Well, it is kind-of a regression, but it really just means that the integer code is now consistent with the floating-point code in having an unnecessary MOVPRFX. So I think adjusting the count is fine. Presumably any future fix for the existing redundant MOVPRFXs will apply to the new ones as well. The patch looks good to me, just some very minor nits: > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type) > > /* Record that a fully-masked version of LOOP_VINFO would need MASKS to > contain a sequence of NVECTORS masks that each control a vector of type > - VECTYPE. */ > + VECTYPE. SCALAR_MASK if non-null, represents the mask used for > corresponding > + load/store stmt. */ Should be two spaces between sentences. Maybe: VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND these vector masks with the vector version of SCALAR_MASK. */ since the mask isn't necessarily for a load or store statement. > [...] > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, tree, > stmt_vec_info, > says how the load or store is going to be implemented and GROUP_SIZE > is the number of load or store statements in the containing group. > If the access is a gather load or scatter store, GS_INFO describes > - its arguments. > + its arguments. SCALAR_MASK is the scalar mask used for corresponding > + load or store stmt. Maybe: its arguments. If the load or store is conditional, SCALAR_MASK is the condition under which it occurs. since SCALAR_MASK can be null here too. > [...] > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > /* Handle cond expr. */ > for (j = 0; j < ncopies; j++) > { > + tree loop_mask = NULL_TREE; > + bool swap_cond_operands = false; > + > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > + { > + scalar_cond_masked_key cond (cond_expr, ncopies); > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > + { > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); > + } > + else > + { > + cond.code = invert_tree_comparison (cond.code, > + HONOR_NANS (TREE_TYPE > (cond.op0))); Long line. Maybe just split it out into a separate assignment: bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0)); cond.code = invert_tree_comparison (cond.code, honor_nans); > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > + { > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, > j); Long line here too. > [...] > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > } > } > } > + > + if (loop_mask) > + { > + if (COMPARISON_CLASS_P (vec_compare)) > + { > + tree tmp = make_ssa_name (vec_cmp_type); > + gassign *g = gimple_build_assign (tmp, > + TREE_CODE (vec_compare), > + TREE_OPERAND (vec_compare, > 0), d> + TREE_OPERAND (vec_compare, 1)); Two long lines. > + vect_finish_stmt_generation (stmt_info, g, gsi); > + vec_compare = tmp; > + } > + > + tree tmp2 = make_ssa_name (vec_cmp_type); > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, > vec_compare, loop_mask); Long line here too. > [...] > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index dc181524744..c4b2d8e8647 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context *ctxt) > { > return new pass_ipa_increase_alignment (ctxt); > } > + > +/* If code(T) is comparison op or def of comparison stmt, > + extract it's operands. > + Else return <NE_EXPR, T, 0>. */ > + > +void > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t) > +{ > + if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison) > + { > + this->code = TREE_CODE (t); > + this->op0 = TREE_OPERAND (t, 0); > + this->op1 = TREE_OPERAND (t, 1); > + return; > + } > + > + if (TREE_CODE (t) == SSA_NAME) > + { > + gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t)); > + if (stmt) > + { Might as well do this as: if (TREE_CODE (t) == SSA_NAME) if (gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t))) { The patch (as hoped) introduces some XPASSes: XPASS: gcc.target/aarch64/sve/cond_cnot_2.c scan-assembler-not \\tsel\\t XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 252 XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 180 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 Could you remove the associated xfails (and comments above them where appropriate)? OK with those changes from my POV, but please give Richi a day or so to object. Thanks for doing this. Richard