Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > Hi, > The attached patch tries to fix PR91272. > Does it look OK ? > > With patch, I see following failures for aarch64-sve.exp: > FAIL: gcc.target/aarch64/sve/clastb_1.c -march=armv8.2-a+sve > scan-assembler \\tclastb\\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\\.s > FAIL: gcc.target/aarch64/sve/clastb_2.c -march=armv8.2-a+sve > scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.s > FAIL: gcc.target/aarch64/sve/clastb_3.c -march=armv8.2-a+sve > scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.b > FAIL: gcc.target/aarch64/sve/clastb_5.c -march=armv8.2-a+sve > scan-assembler \\tclastb\\tx[0-9]+, p[0-7], x[0-9]+, z[0-9]+\\.d > > For instance, in clastb_1.c, it now emits: > clastb s1, p1, s1, z0.s > while using a fully predicated loop. > Should I adjust the tests ?
Yeah, that's an improvement really. > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index acdd90784dc..2cad2cb94c8 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -10016,7 +10016,8 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > /* See whether another part of the vectorized code applies a loop > mask to the condition, or to its inverse. */ > > - if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + && reduction_type != EXTRACT_LAST_REDUCTION) > { > scalar_cond_masked_key cond (cond_expr, ncopies); > if (loop_vinfo->scalar_cond_masked_set.contains (cond)) The context here is: if (loop_vinfo->scalar_cond_masked_set.contains (cond)) { vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); } else { bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0)); cond.code = invert_tree_comparison (cond.code, honor_nans); if (loop_vinfo->scalar_cond_masked_set.contains (cond)) { vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); cond_code = cond.code; swap_cond_operands = true; } } Rather than have another instance of vect_get_loop_mask, I think it would cleaner to use a nonnull "masks" to decide whether to apply the loop mask: vec_loop_masks *masks = NULL; if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) { if (reduction_type == EXTRACT_LAST_REDUCTION || loop_vinfo->scalar_cond_masked_set.contains (cond)) masks = &LOOP_VINFO_MASKS (loop_vinfo); else { ... } Then: > @@ -10116,6 +10117,15 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > vec_then_clause = vec_oprnds2[i]; > vec_else_clause = vec_oprnds3[i]; > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + && reduction_type == EXTRACT_LAST_REDUCTION) > + { > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + unsigned vec_num = vec_oprnds0.length (); > + loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies, > + vectype, vec_num * j + i); > + } > + ...do this vect_get_loop_mask under the condition of "if (masks)". > if (swap_cond_operands) > std::swap (vec_then_clause, vec_else_clause); > > @@ -10180,7 +10190,7 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > vec != { 0, ... } (masked in the MASK_LOAD, > unmasked in the VEC_COND_EXPR). */ > > - if (loop_mask) > + if (loop_mask && reduction_type != EXTRACT_LAST_REDUCTION) > { > if (COMPARISON_CLASS_P (vec_compare)) > { > @@ -10220,6 +10230,16 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > vect_finish_stmt_generation (stmt_info, new_stmt, gsi); > vec_compare = vec_compare_name; > } The code above here: if (!is_gimple_val (vec_compare)) { tree vec_compare_name = make_ssa_name (vec_cmp_type); gassign *new_stmt = gimple_build_assign (vec_compare_name, vec_compare); vect_finish_stmt_generation (stmt_info, new_stmt, gsi); vec_compare = vec_compare_name; } is doing something similar to the new COND_EXPR handling: if (COMPARISON_CLASS_P (vec_compare)) { tree tmp = make_ssa_name (vec_cmp_type); tree op0 = TREE_OPERAND (vec_compare, 0); tree op1 = TREE_OPERAND (vec_compare, 1); gassign *g = gimple_build_assign (tmp, TREE_CODE (vec_compare), op0, op1); vect_finish_stmt_generation (stmt_info, g, gsi); vec_compare = tmp; } There's then an added case: if (must_invert_cmp_result) { tree vec_compare_name = make_ssa_name (vec_cmp_type); gassign *new_stmt = gimple_build_assign (vec_compare_name, BIT_NOT_EXPR, vec_compare); vect_finish_stmt_generation (stmt_info, new_stmt, gsi); vec_compare = vec_compare_name; } that would be a safe no-op for the normal COND_EXPR path (because must_invert_cmp_result is always false then). Then this: > + > + if (loop_mask) > + { > + tree tmp = make_ssa_name (vec_cmp_type); > + gassign *g = gimple_build_assign (tmp, BIT_AND_EXPR, > + vec_compare, loop_mask); > + vect_finish_stmt_generation (stmt_info, g, gsi); > + vec_compare = tmp; > + } > + is the equivalent of the above: tree tmp2 = make_ssa_name (vec_cmp_type); gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, loop_mask); vect_finish_stmt_generation (stmt_info, g, gsi); vec_compare = tmp2; So with this patch, I think the EXTRACT_LAST_REDUCTION and the normal COND_EXPR paths should be able to share the mask generation code. Thanks, Richard