On Tue, Nov 5, 2019 at 3:28 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > vect_analyze_loop_costing uses two profitability thresholds: a runtime > one and a static compile-time one. The runtime one is simply the point > at which the vector loop is cheaper than the scalar loop, while the > static one also takes into account the cost of choosing between the > scalar and vector loops at runtime. We compare this static cost against > the expected execution frequency to decide whether it's worth generating > any vector code at all. > > However, we never reclaimed the cost of applying the runtime threshold > if it turned out that the vector code can always be used. And we only > know whether that's true once we've calculated what the runtime > threshold would be.
OK. > > 2019-11-04 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > * tree-vectorizer.h (vect_apply_runtime_profitability_check_p): > New function. > * tree-vect-loop-manip.c (vect_loop_versioning): Use it. > * tree-vect-loop.c (vect_analyze_loop_2): Likewise. > (vect_transform_loop): Likewise. > (vect_analyze_loop_costing): Don't take the cost of versioning > into account for the static profitability threshold if it turns > out that no versioning is needed. > > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2019-11-05 11:14:42.786884473 +0000 > +++ gcc/tree-vectorizer.h 2019-11-05 14:19:33.829371745 +0000 > @@ -1557,6 +1557,17 @@ vect_get_scalar_dr_size (dr_vec_info *dr > return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_info->dr)))); > } > > +/* Return true if LOOP_VINFO requires a runtime check for whether the > + vector loop is profitable. */ > + > +inline bool > +vect_apply_runtime_profitability_check_p (loop_vec_info loop_vinfo) > +{ > + unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); > + return (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + && th >= vect_vf_for_cost (loop_vinfo)); > +} > + > /* Source location + hotness information. */ > extern dump_user_location_t vect_location; > > Index: gcc/tree-vect-loop-manip.c > =================================================================== > --- gcc/tree-vect-loop-manip.c 2019-11-05 10:38:31.838181047 +0000 > +++ gcc/tree-vect-loop-manip.c 2019-11-05 14:19:33.825371773 +0000 > @@ -3173,8 +3173,7 @@ vect_loop_versioning (loop_vec_info loop > = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo); > unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); > > - if (th >= vect_vf_for_cost (loop_vinfo) > - && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + if (vect_apply_runtime_profitability_check_p (loop_vinfo) > && !ordered_p (th, versioning_threshold)) > cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters, > build_int_cst (TREE_TYPE (scalar_loop_iters), > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2019-11-05 11:14:42.782884501 +0000 > +++ gcc/tree-vect-loop.c 2019-11-05 14:19:33.829371745 +0000 > @@ -1689,6 +1689,24 @@ vect_analyze_loop_costing (loop_vec_info > return 0; > } > > + /* The static profitablity threshold min_profitable_estimate includes > + the cost of having to check at runtime whether the scalar loop > + should be used instead. If it turns out that we don't need or want > + such a check, the threshold we should use for the static estimate > + is simply the point at which the vector loop becomes more profitable > + than the scalar loop. */ > + if (min_profitable_estimate > min_profitable_iters > + && !LOOP_REQUIRES_VERSIONING (loop_vinfo) > + && !LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) > + && !LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > + && !vect_apply_runtime_profitability_check_p (loop_vinfo)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "no need for a runtime" > + " choice between the scalar and vector loops\n"); > + min_profitable_estimate = min_profitable_iters; > + } > + > HOST_WIDE_INT estimated_niter; > > /* If we are vectorizing an epilogue then we know the maximum number of > @@ -2225,8 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_ > > /* Use the same condition as vect_transform_loop to decide when to use > the cost to determine a versioning threshold. */ > - if (th >= vect_vf_for_cost (loop_vinfo) > - && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + if (vect_apply_runtime_profitability_check_p (loop_vinfo) > && ordered_p (th, niters_th)) > niters_th = ordered_max (poly_uint64 (th), niters_th); > > @@ -8268,14 +8285,13 @@ vect_transform_loop (loop_vec_info loop_ > run at least the (estimated) vectorization factor number of times > checking is pointless, too. */ > th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); > - if (th >= vect_vf_for_cost (loop_vinfo) > - && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > + if (vect_apply_runtime_profitability_check_p (loop_vinfo)) > { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_NOTE, vect_location, > - "Profitability threshold is %d loop iterations.\n", > - th); > - check_profitability = true; > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "Profitability threshold is %d loop iterations.\n", > + th); > + check_profitability = true; > } > > /* Make sure there exists a single-predecessor exit bb. Do this before