> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Wednesday, November 6, 2024 2:30 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina > <tamar.christ...@arm.com> > Subject: [PATCH 3/5] Add LOOP_VINFO_MAIN_LOOP_INFO > > The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside > LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main > vectorized loop info and the preceeding vectorized epilogue. > This is critical for correctness as we need to disallow never > executed epilogues by costing in vect_analyze_loop_costing as > we assume those do not exist when deciding to add a skip-vector > edge during peeling. The patch also changes how multiple vector > epilogues are handled - instead of the epilogue_vinfos array in > the main loop info we now record the single epilogue_vinfo there > and further epilogues in the epilogue_vinfo member of the > epilogue info. This simplifies code.
Nice! FWIW, makes sense and looks good to me :) Cheers, Tamar > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > * tree-vectorizer.h (_loop_vec_info::main_loop_info): New. > (LOOP_VINFO_MAIN_LOOP_INFO): Likewise. > (_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos > from array to single element. > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize > main_loop_info and epilogue_vinfo. Remove epilogue_vinfos > allocation. > (_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos. > (vect_create_loop_vinfo): Rename parameter, set > LOOP_VINFO_MAIN_LOOP_INFO. > (vect_analyze_loop_1): Rename parameter. > (vect_analyze_loop_costing): Properly distinguish between > the main vector loop and the preceeding epilogue. > (vect_analyze_loop): Change for epilogue_vinfos no longer > being a vector. > * tree-vect-loop-manip.cc (vect_do_peeling): Simplify and > thereby handle a vector epilogue of a vector epilogue. > --- > gcc/tree-vect-loop-manip.cc | 22 +++++------- > gcc/tree-vect-loop.cc | 67 ++++++++++++++++++++----------------- > gcc/tree-vectorizer.h | 12 +++++-- > 3 files changed, 53 insertions(+), 48 deletions(-) > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 5bbeeddd854..c8dc7153298 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3100,12 +3100,12 @@ vect_get_main_loop_result (loop_vec_info > loop_vinfo, tree main_loop_value, > The analysis resulting in this epilogue loop's loop_vec_info was performed > in the same vect_analyze_loop call as the main loop's. At that time > vect_analyze_loop constructs a list of accepted loop_vec_info's for lower > - vectorization factors than the main loop. This list is stored in the main > - loop's loop_vec_info in the 'epilogue_vinfos' member. Everytime we > decide to > - vectorize the epilogue loop for a lower vectorization factor, the > - loop_vec_info sitting at the top of the epilogue_vinfos list is removed, > - updated and linked to the epilogue loop. This is later used to vectorize > - the epilogue. The reason the loop_vec_info needs updating is that it was > + vectorization factors than the main loop. This list is chained in the > + loop's loop_vec_info in the 'epilogue_vinfo' member. When we decide to > + vectorize the epilogue loop for a lower vectorization factor, the > + loop_vec_info in epilogue_vinfo is updated and linked to the epilogue > loop. > + This is later used to vectorize the epilogue. > + The reason the loop_vec_info needs updating is that it was > constructed based on the original main loop, and the epilogue loop is a > copy of this loop, so all links pointing to statements in the original > loop > need updating. Furthermore, these loop_vec_infos share the > @@ -3128,7 +3128,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, > tree nitersm1, > profile_probability prob_prolog, prob_vector, prob_epilog; > int estimated_vf; > int prolog_peeling = 0; > - bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0; > + bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL; > /* We currently do not support prolog peeling if the target alignment is > not > known at compile time. 'vect_gen_prolog_loop_niters' depends on the > target alignment being constant. */ > @@ -3255,13 +3255,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > else > niters_prolog = build_int_cst (type, 0); > > - loop_vec_info epilogue_vinfo = NULL; > - if (vect_epilogues) > - { > - epilogue_vinfo = loop_vinfo->epilogue_vinfos[0]; > - loop_vinfo->epilogue_vinfos.ordered_remove (0); > - } > - > + loop_vec_info epilogue_vinfo = loop_vinfo->epilogue_vinfo; > tree niters_vector_mult_vf = NULL_TREE; > /* Saving NITERs before the loop, as this may be changed by prologue. */ > tree before_loop_niters = LOOP_VINFO_NITERS (loop_vinfo); > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 6059ce031d1..5de90b6573a 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -1071,7 +1071,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, > vec_info_shared *shared) > has_mask_store (false), > scalar_loop_scaling (profile_probability::uninitialized ()), > scalar_loop (NULL), > + main_loop_info (NULL), > orig_loop_info (NULL), > + epilogue_vinfo (NULL), > drs_advanced_by (NULL_TREE), > vec_loop_iv_exit (NULL), > vec_epilogue_loop_iv_exit (NULL), > @@ -1128,8 +1130,6 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, > vec_info_shared *shared) > } > } > } > - > - epilogue_vinfos.create (6); > } > > /* Free all levels of rgroup CONTROLS. */ > @@ -1155,7 +1155,6 @@ _loop_vec_info::~_loop_vec_info () > release_vec_loop_controls (&lens); > delete ivexpr_map; > delete scan_map; > - epilogue_vinfos.release (); > delete scalar_costs; > delete vector_costs; > > @@ -1936,15 +1935,20 @@ vect_analyze_loop_form (class loop *loop, gimple > *loop_vectorized_call, > loop_vec_info > vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, > const vect_loop_form_info *info, > - loop_vec_info main_loop_info) > + loop_vec_info orig_loop_info) > { > loop_vec_info loop_vinfo = new _loop_vec_info (loop, shared); > LOOP_VINFO_NITERSM1 (loop_vinfo) = info->number_of_iterationsm1; > LOOP_VINFO_NITERS (loop_vinfo) = info->number_of_iterations; > LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = info->number_of_iterations; > - LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = main_loop_info; > + LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_info; > + if (orig_loop_info && LOOP_VINFO_EPILOGUE_P (orig_loop_info)) > + LOOP_VINFO_MAIN_LOOP_INFO (loop_vinfo) > + = LOOP_VINFO_MAIN_LOOP_INFO (orig_loop_info); > + else > + LOOP_VINFO_MAIN_LOOP_INFO (loop_vinfo) = orig_loop_info; > /* Also record the assumptions for versioning. */ > - if (!integer_onep (info->assumptions) && !main_loop_info) > + if (!integer_onep (info->assumptions) && !orig_loop_info) > LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions; > > for (gcond *cond : info->conds) > @@ -2314,17 +2318,19 @@ vect_analyze_loop_costing (loop_vec_info > loop_vinfo, > { > loop_vec_info orig_loop_vinfo > = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); > + loop_vec_info main_loop_vinfo > + = LOOP_VINFO_MAIN_LOOP_INFO (loop_vinfo); > unsigned lowest_vf > = constant_lower_bound (LOOP_VINFO_VECT_FACTOR > (orig_loop_vinfo)); > int prolog_peeling = 0; > - if (!vect_use_loop_mask_for_alignment_p (loop_vinfo)) > - prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT > (orig_loop_vinfo); > + if (!vect_use_loop_mask_for_alignment_p (main_loop_vinfo)) > + prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT > (main_loop_vinfo); > if (prolog_peeling >= 0 > && known_eq (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo), > lowest_vf)) > { > unsigned gap > - = LOOP_VINFO_PEELING_FOR_GAPS (orig_loop_vinfo) ? 1 : 0; > + = LOOP_VINFO_PEELING_FOR_GAPS (main_loop_vinfo) ? 1 : 0; > scalar_niters = ((scalar_niters - gap - prolog_peeling) > % lowest_vf + gap); > } > @@ -3440,7 +3446,7 @@ vect_joust_loop_vinfos (loop_vec_info > new_loop_vinfo, > return true; > } > > -/* Analyze LOOP with VECTOR_MODES[MODE_I] and as epilogue if > MAIN_LOOP_VINFO is > +/* Analyze LOOP with VECTOR_MODES[MODE_I] and as epilogue if > ORIG_LOOP_VINFO is > not NULL. Set AUTODETECTED_VECTOR_MODE if VOIDmode and advance > MODE_I to the next mode useful to analyze. > Return the loop_vinfo on success and wrapped null on failure. */ > @@ -3448,13 +3454,13 @@ vect_joust_loop_vinfos (loop_vec_info > new_loop_vinfo, > static opt_loop_vec_info > vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared, > const vect_loop_form_info *loop_form_info, > - loop_vec_info main_loop_vinfo, > + loop_vec_info orig_loop_vinfo, > const vector_modes &vector_modes, unsigned &mode_i, > machine_mode &autodetected_vector_mode, > bool &fatal) > { > loop_vec_info loop_vinfo > - = vect_create_loop_vinfo (loop, shared, loop_form_info, main_loop_vinfo); > + = vect_create_loop_vinfo (loop, shared, loop_form_info, orig_loop_vinfo); > > machine_mode vector_mode = vector_modes[mode_i]; > loop_vinfo->vector_mode = vector_mode; > @@ -3471,7 +3477,7 @@ vect_analyze_loop_1 (class loop *loop, > vec_info_shared *shared, > res ? "succeeded" : "failed", > GET_MODE_NAME (loop_vinfo->vector_mode)); > > - if (res && !main_loop_vinfo && suggested_unroll_factor > 1) > + if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && suggested_unroll_factor > > 1) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > @@ -3480,7 +3486,7 @@ vect_analyze_loop_1 (class loop *loop, > vec_info_shared *shared, > suggested_unroll_factor, > slp_done_for_suggested_uf ? "on" : "off"); > loop_vec_info unroll_vinfo > - = vect_create_loop_vinfo (loop, shared, loop_form_info, > main_loop_vinfo); > + = vect_create_loop_vinfo (loop, shared, loop_form_info, NULL); > unroll_vinfo->vector_mode = vector_mode; > unroll_vinfo->suggested_unroll_factor = suggested_unroll_factor; > opt_result new_res = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL, > @@ -3534,7 +3540,7 @@ vect_analyze_loop_1 (class loop *loop, > vec_info_shared *shared, > { > delete loop_vinfo; > if (fatal) > - gcc_checking_assert (main_loop_vinfo == NULL); > + gcc_checking_assert (orig_loop_vinfo == NULL); > return opt_loop_vec_info::propagate_failure (res); > } > > @@ -3746,21 +3752,17 @@ vect_analyze_loop (class loop *loop, gimple > *loop_vectorized_call, > > if (loop_vinfo) > { > - if (pick_lowest_cost_p) > + if (pick_lowest_cost_p > + && orig_loop_vinfo->epilogue_vinfo > + && vect_joust_loop_vinfos (loop_vinfo, > + orig_loop_vinfo->epilogue_vinfo)) > { > - /* Keep trying to roll back vectorization attempts while the > - loop_vec_infos they produced were worse than this one. */ > - vec<loop_vec_info> &vinfos = first_loop_vinfo->epilogue_vinfos; > - while (!vinfos.is_empty () > - && vect_joust_loop_vinfos (loop_vinfo, vinfos.last ())) > - { > - gcc_assert (vect_epilogues); > - delete vinfos.pop (); > - } > + gcc_assert (vect_epilogues); > + delete orig_loop_vinfo->epilogue_vinfo; > + orig_loop_vinfo->epilogue_vinfo = nullptr; > } > - /* For now only allow one epilogue loop. */ > - if (first_loop_vinfo->epilogue_vinfos.is_empty ()) > - first_loop_vinfo->epilogue_vinfos.safe_push (loop_vinfo); > + if (!orig_loop_vinfo->epilogue_vinfo) > + orig_loop_vinfo->epilogue_vinfo = loop_vinfo; > else > { > delete loop_vinfo; > @@ -3779,11 +3781,12 @@ vect_analyze_loop (class loop *loop, gimple > *loop_vectorized_call, > > } > > - if (!first_loop_vinfo->epilogue_vinfos.is_empty ()) > + if (first_loop_vinfo->epilogue_vinfo) > { > poly_uint64 lowest_th > = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo); > - for (auto epilog_vinfo : first_loop_vinfo->epilogue_vinfos) > + loop_vec_info epilog_vinfo = first_loop_vinfo->epilogue_vinfo; > + do > { > poly_uint64 th = LOOP_VINFO_VERSIONING_THRESHOLD (epilog_vinfo); > gcc_assert (!LOOP_REQUIRES_VERSIONING (epilog_vinfo) > @@ -3791,13 +3794,15 @@ vect_analyze_loop (class loop *loop, gimple > *loop_vectorized_call, > /* Keep track of the known smallest versioning threshold. */ > if (ordered_p (lowest_th, th)) > lowest_th = ordered_min (lowest_th, th); > + epilog_vinfo = epilog_vinfo->epilogue_vinfo; > } > + while (epilog_vinfo); > LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo) = lowest_th; > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "***** Choosing epilogue vector mode %s\n", > GET_MODE_NAME > - (first_loop_vinfo->epilogue_vinfos[0]->vector_mode)); > + (first_loop_vinfo->epilogue_vinfo->vector_mode)); > } > > return first_loop_vinfo; > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 5a1bd237beb..88db34b87b6 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -987,12 +987,17 @@ public: > class loop *scalar_loop; > > /* For loops being epilogues of already vectorized loops > - this points to the original vectorized loop. Otherwise NULL. */ > + this points to the main vectorized loop. Otherwise NULL. */ > + _loop_vec_info *main_loop_info; > + > + /* For loops being epilogues of already vectorized loops > + this points to the preceeding vectorized (possibly epilogue) loop. > + Otherwise NULL. */ > _loop_vec_info *orig_loop_info; > > - /* Used to store loop_vec_infos of epilogues of this loop during > + /* Used to store loop_vec_infos of the epilogue of this loop during > analysis. */ > - vec<_loop_vec_info *> epilogue_vinfos; > + _loop_vec_info *epilogue_vinfo; > > /* If this is an epilogue loop the DR advancement applied. */ > tree drs_advanced_by; > @@ -1096,6 +1101,7 @@ public: > #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling > #define LOOP_VINFO_HAS_MASK_STORE(L) (L)->has_mask_store > #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec > +#define LOOP_VINFO_MAIN_LOOP_INFO(L) (L)->main_loop_info > #define LOOP_VINFO_ORIG_LOOP_INFO(L) (L)->orig_loop_info > #define LOOP_VINFO_SIMD_IF_COND(L) (L)->simd_if_cond > #define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)- > >inner_loop_cost_factor > -- > 2.43.0