On Tue, 15 Apr 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Tuesday, April 15, 2025 12:50 PM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: Richard Sandiford <richard.sandif...@arm.com>; gcc-patches@gcc.gnu.org; > > nd <n...@arm.com> > > Subject: RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS > > [PR119351] > > > > On Tue, 15 Apr 2025, Tamar Christina wrote: > > > > > > -----Original Message----- > > > > From: Richard Sandiford <richard.sandif...@arm.com> > > > > Sent: Tuesday, April 15, 2025 10:52 AM > > > > To: Tamar Christina <tamar.christ...@arm.com> > > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de > > > > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS > > > > [PR119351] > > > > > > > > Tamar Christina <tamar.christ...@arm.com> writes: > > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > > > > > index > > > > > > 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c > > > > e591f7c38b76d07f 100644 > > > > > --- a/gcc/tree-vect-loop-manip.cc > > > > > +++ b/gcc/tree-vect-loop-manip.cc > > > > > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info > > > > loop_vinfo, > > > > > return false; > > > > > } > > > > > > > > > > + /* With early break vectorization we don't know whether the > > > > > accesses will > > stay > > > > > + inside the loop or not. TODO: The early break adjustment code > > > > > can be > > > > > + implemented the same way for vectorizable_linear_induction. > > > > > However > > we > > > > > + can't test this today so reject it. */ > > > > > + if (niters_skip != NULL_TREE > > > > > + && vect_use_loop_mask_for_alignment_p (loop_vinfo) > > > > > + && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > > > > > + && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > > > > + { > > > > > + if (dump_enabled_p ()) > > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > > + "Peeling for alignement using masking is not > > > > > supported" > > > > > + " for nonlinear induction when using early > > > > > breaks.\n"); > > > > > + return false; > > > > > + } > > > > > + > > > > > return true; > > > > > } > > > > > > > > FTR, I was wondering here whether we should predict this in advance and > > > > instead drop down to peeling for alignment without masks. It probably > > > > isn't worth the effort though. > > > > > > We could move the check into vect_use_loop_mask_for_alignment_p where > > > rejecting it there would get it to fall back to scalar peeling. That > > > seems simple > > enough > > > if that's preferrable. > > > > The above is perferable IMO (short of fixing up that case, but with > > a testcase). > > > > I wasn't able to make a testcase before as any non-linear induction feeding a > load becomes > a gather load, which we block outright way before getting here though. I > couldn't think of > an example where it wouldn't be, even a gapped load e.g +=2 became one.
Well, a classic one would be int x = 2; for (i = 0; i < n; ++i) { if (y[i]) break; x *= 3; } return x; or the negate case int x = 2; for (i = 0; i < n; ++i) { if (y[i]) break; x = -x; } return x; possibly we mark those as unhandled for early exits somewhere already. There does seem to be code handling PFA with masks for them. Richard. > Thanks, > Tamar > > > Richard. > > > > > Cheers, > > > Tamar > > > > > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > > > index > > > > > > 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0 > > > > 057d2fa46245d04d9 100644 > > > > > --- a/gcc/tree-vect-loop.cc > > > > > +++ b/gcc/tree-vect-loop.cc > > > > > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info > > > > loop_vinfo, > > > > > LOOP_VINFO_MASK_SKIP_NITERS > > > > (loop_vinfo)); > > > > > peel_mul = gimple_build_vector_from_val (&init_stmts, > > > > > step_vectype, > > > > > peel_mul); > > > > > + > > > > > + /* If early break then we have to create a new PHI which we > > > > > can use as > > > > > + an offset to adjust the induction reduction in early exits. > > > > > */ > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > > > > + { > > > > > + auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS > > > > > (loop_vinfo); > > > > > + tree ty_skip_niters = TREE_TYPE (skip_niters); > > > > > + tree break_lhs_phi = NULL_TREE; > > > > > + break_lhs_phi = vect_get_new_vect_var (ty_skip_niters, > > > > > + vect_scalar_var, > > > > > + "pfa_iv_offset"); > > > > > + gphi *nphi = create_phi_node (break_lhs_phi, bb); > > > > > + add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION); > > > > > + add_phi_arg (nphi, build_zero_cst (ty_skip_niters), > > > > > + loop_latch_edge (iv_loop), UNKNOWN_LOCATION); > > > > > + > > > > > + /* Rewrite all the early exit usages. */ > > > > > + tree phi_lhs = PHI_RESULT (phi); > > > > > + imm_use_iterator iter; > > > > > + use_operand_p use_p; > > > > > + gimple *use_stmt; > > > > > + > > > > > + FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs) > > > > > + { > > > > > + use_stmt = USE_STMT (use_p); > > > > > + if (!flow_bb_inside_loop_p (iv_loop, gimple_bb > > > > > (use_stmt)) > > > > > + && is_a <gphi *> (use_stmt)) > > > > > + { > > > > > + auto gsi = gsi_last_bb (use_stmt->bb); > > > > > + for (auto exit : get_loop_exit_edges (iv_loop)) > > > > > + if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo) > > > > > + && bb == exit->src) > > > > > + { > > > > > + /* Now create the PHI for the outside loop > > > > > usage to > > > > > + retrieve the value for the offset > > > > > counter. */ > > > > > + tree rphi_lhs = make_ssa_name > > > > > (ty_skip_niters); > > > > > + gphi *rphi > > > > > + = create_phi_node (rphi_lhs, > > > > > use_stmt->bb); > > > > > + for (unsigned i = 0; i < > > > > > gimple_phi_num_args (rphi); > > > > > + i++) > > > > > + SET_PHI_ARG_DEF (rphi, i, PHI_RESULT > > > > > (nphi)); > > > > > + > > > > > + tree tmp = make_ssa_name (TREE_TYPE > > > > > (phi_lhs)); > > > > > + tree stmt_lhs = PHI_RESULT (use_stmt); > > > > > + imm_use_iterator iter2; > > > > > + gimple *use_stmt2; > > > > > + use_operand_p use2_p; > > > > > + > > > > > + /* Now rewrite all the usages first. */ > > > > > + FOR_EACH_IMM_USE_STMT (use_stmt2, iter2, > > > > stmt_lhs) > > > > > + FOR_EACH_IMM_USE_ON_STMT (use2_p, iter2) > > > > > + SET_USE (use2_p, tmp); > > > > > + > > > > > + /* And then generate the adjustment to > > > > > avoid the > > > > > + update code from updating this new > > > > > usage. The > > > > > + multiplicaiton is to get the original IV > > > > > and the > > > > > + downwards counting IV correct. */ > > > > > > > > typo: multiplication > > > > > > > > But I don't think it's just upcounting vs downcounting. An upcounting > > > > iv > > > > with step 2 would also need the multiplication. That is, we're applying > > > > PHI_RESULT (rphi) iv updates, and so need to add the iv step that many > > > > times. > > > > > > > > So IMO it would be clearer to drop the reference specifically to > > > > downcounting > > > > here. > > > > > > > > The patch LGTM with the comment nit fixed, but Richi should have the > > > > final say. > > > > > > > > Thanks, > > > > Richard > > > > > > > > > + gimple_seq iv_stmts = NULL; > > > > > + tree rphi_step > > > > > + = gimple_convert (&iv_stmts, > > > > > ty_skip_niters, > > > > > + step_expr); > > > > > + tree tmp2 > > > > > + = gimple_build (&iv_stmts, MULT_EXPR, > > > > > + ty_skip_niters, rphi_step, > > > > > + PHI_RESULT (rphi)); > > > > > + > > > > > + if (POINTER_TYPE_P (TREE_TYPE (stmt_lhs))) > > > > > + tmp2 > > > > > + = gimple_build (&iv_stmts, > > > > > POINTER_PLUS_EXPR, > > > > > + TREE_TYPE (stmt_lhs), > > > > > stmt_lhs, > > > > > + tmp2); > > > > > + else > > > > > + { > > > > > + tmp2 > > > > > + = gimple_convert (&iv_stmts, > > > > > + TREE_TYPE > > > > > (stmt_lhs), > > > > > + tmp2); > > > > > + tmp2 > > > > > + = gimple_build (&iv_stmts, PLUS_EXPR, > > > > > + TREE_TYPE (stmt_lhs), > > > > > stmt_lhs, > > > > > + tmp2); > > > > > + } > > > > > + > > > > > + gsi_insert_seq_before (&gsi, iv_stmts, > > > > > + GSI_SAME_STMT); > > > > > + gimple *cvt_stmt = > > > > > + gimple_build_assign (tmp, > > > > > VIEW_CONVERT_EXPR, > > > > > + build1 > > > > > (VIEW_CONVERT_EXPR, > > > > > + TREE_TYPE > > > > > (phi_lhs), > > > > > + tmp2)); > > > > > + gsi_insert_before (&gsi, cvt_stmt, > > > > > GSI_SAME_STMT); > > > > > + } > > > > > + /* All early exits point to the same common > > > > > block, so we > > > > > + only have to find the first one. */ > > > > > + break; > > > > > + } > > > > > + } > > > > > + } > > > > > } > > > > > tree step_mul = NULL_TREE; > > > > > unsigned ivn; > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)