To align vectorized def/use when lane-reducing op is present in loop reduction, we may need to insert extra trivial pass-through copies, which would cause mismatch between lane-reducing vector copy and loop mask index. This could be fixed by computing the right index around a new counter on effective lane- reducing vector copies.
Thanks, Feng --- gcc/ PR tree-optimization/116985 * tree-vect-loop.cc (vect_transform_reduction): Compute loop mask index based on effective vector copies for reduction op. --- gcc/tree-vect-loop.cc | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index ade72a5124f..025442aabc3 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8916,6 +8916,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info); unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length (); + unsigned mask_index = 0; for (unsigned i = 0; i < num; ++i) { @@ -8954,7 +8955,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, std::swap (vop[0], vop[1]); } tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, - vec_num * ncopies, vectype_in, i); + vec_num * ncopies, vectype_in, + mask_index++); gcall *call = gimple_build_call_internal (cond_fn, 4, mask, vop[0], vop[1], vop[0]); new_temp = make_ssa_name (vec_dest, call); @@ -8971,7 +8973,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, if (masked_loop_p && mask_by_cond_expr) { tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, - vec_num * ncopies, vectype_in, i); + vec_num * ncopies, vectype_in, + mask_index++); build_vect_cond_expr (code, vop, mask, gsi); } -- 2.17.1