On Fri, 8 Jun 2012, William J. Schmidt wrote:
> This patch adds a heuristic to the vectorizer when estimating the
> minimum profitable number of iterations. The heuristic is
> target-dependent, and is currently disabled for all targets except
> PowerPC. However, the intent is to make it general enough to be useful
> for other targets that want to opt in.
>
> A previous patch addressed some PowerPC SPEC degradations by modifying
> the vector cost model values for vec_perm and vec_promote_demote. The
> values were set a little higher than their natural values because the
> natural values were not sufficient to prevent a poor vectorization
> choice. However, this is not the right long-term solution, since it can
> unnecessarily constrain other vectorization choices involving permute
> instructions.
>
> Analysis of the badly vectorized loop (in sphinx3) showed that the
> problem was overcommitment of vector resources -- too many vector
> instructions issued without enough non-vector instructions available to
> cover the delays. The vector cost model assumes that instructions
> always have a constant cost, and doesn't have a way of judging this kind
> of "density" of vector instructions.
>
> The present patch adds a heuristic to recognize when a loop is likely to
> overcommit resources, and adds a small penalty to the inside-loop cost
> to account for the expected stalls. The heuristic is parameterized with
> three target-specific values:
>
> * Density threshold: The heuristic will apply only when the
> percentage of inside-loop cost attributable to vectorized
> instructions exceeds this value.
>
> * Size threshold: The heuristic will apply only when the
> inside-loop cost exceeds this value.
>
> * Penalty: The inside-loop cost will be increased by this
> percentage value when the heuristic applies.
>
> Thus only reasonably large loop bodies that are mostly vectorized
> instructions will be affected.
>
> By applying only a small percentage bump to the inside-loop cost, the
> heuristic will only turn off vectorization for loops that were
> considered "barely profitable" to begin with (such as the sphinx3 loop).
> So the heuristic is quite conservative and should not affect the vast
> majority of vectorization decisions.
>
> Together with the new heuristic, this patch reduces the vec_perm and
> vec_promote_demote costs for PowerPC to their natural values.
>
> I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu
> and verified that no performance regressions occur on SPEC cpu2006. Is
> this ok for trunk?
Hmm. I don't like this patch or its general idea too much. Instead
I'd like us to move more of the cost model detail to the target, giving
it a chance to look at the whole loop before deciding on a cost. ISTR
posting the overall idea at some point, but let me repeat it here instead
of trying to find that e-mail.
The basic interface of the cost model should be, in targetm.vectorize
/* Tell the target to start cost analysis of a loop or a basic-block
(if the loop argument is NULL). Returns an opaque pointer to
target-private data. */
void *init_cost (struct loop *loop);
/* Add cost for N vectorized-stmt-kind statements in vector_mode. */
void add_stmt_cost (void *data, unsigned n,
vectorized-stmt-kind,
enum machine_mode vector_mode);
/* Tell the target to compute and return the cost of the accumulated
statements and free any target-private data. */
unsigned finish_cost (void *data);
with eventually slightly different signatures for add_stmt_cost
(like pass in the original scalar stmt?).
It allows the target, at finish_cost time, to evaluate things like
register pressure and resource utilization.
Thanks,
Richard.
> Thanks,
> Bill
>
>
> 2012-06-08 Bill Schmidt <[email protected]>
>
> * doc/tm.texi.in: Add vectorization density hooks.
> * doc/tm.texi: Regenerate.
> * targhooks.c (default_density_pct_threshold): New.
> (default_density_size_threshold): New.
> (default_density_penalty): New.
> * targhooks.h: New decls for new targhooks.c functions.
> * target.def (density_pct_threshold): New DEF_HOOK.
> (density_size_threshold): Likewise.
> (density_penalty): Likewise.
> * tree-vect-loop.c (accum_stmt_cost): New.
> (vect_estimate_min_profitable_iters): Perform density test.
> * config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD):
> New macro definition.
> (TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise.
> (TARGET_VECTORIZE_DENSITY_PENALTY): Likewise.
> (rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and
> vec_promote_demote to correct values.
> (rs6000_density_pct_threshold): New.
> (rs6000_density_size_threshold): New.
> (rs6000_density_penalty): New.
>
>
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi (revision 188305)
> +++ gcc/doc/tm.texi (working copy)
> @@ -5798,6 +5798,27 @@ The default is @code{NULL_TREE} which means to not
> loads.
> @end deftypefn
>
> +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD (void)
> +This hook should return the maximum density, expressed in percent, for
> +which autovectorization of loops with large bodies should be constrained.
> +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}. The default
> +is to return 100, which disables the density test.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD (void)
> +This hook should return the minimum estimated size of a vectorized
> +loop body for which the density test should apply. See also
> +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}. The default is set
> +to the unreasonable value of 1000000, which effectively disables
> +the density test.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PENALTY (void)
> +This hook should return the penalty, expressed in percent, to be applied
> +to the inside-of-loop vectorization costs for a loop failing the density
> +test. The default is 10.
> +@end deftypefn
> +
> @node Anchored Addresses
> @section Anchored Addresses
> @cindex anchored addresses
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in (revision 188305)
> +++ gcc/doc/tm.texi.in (working copy)
> @@ -5730,6 +5730,27 @@ The default is @code{NULL_TREE} which means to not
> loads.
> @end deftypefn
>
> +@hook TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD
> +This hook should return the maximum density, expressed in percent, for
> +which autovectorization of loops with large bodies should be constrained.
> +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}. The default
> +is to return 100, which disables the density test.
> +@end deftypefn
> +
> +@hook TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD
> +This hook should return the minimum estimated size of a vectorized
> +loop body for which the density test should apply. See also
> +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}. The default is set
> +to the unreasonable value of 1000000, which effectively disables
> +the density test.
> +@end deftypefn
> +
> +@hook TARGET_VECTORIZE_DENSITY_PENALTY
> +This hook should return the penalty, expressed in percent, to be applied
> +to the inside-of-loop vectorization costs for a loop failing the density
> +test. The default is 10.
> +@end deftypefn
> +
> @node Anchored Addresses
> @section Anchored Addresses
> @cindex anchored addresses
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c (revision 188305)
> +++ gcc/targhooks.c (working copy)
> @@ -990,6 +990,33 @@ default_autovectorize_vector_sizes (void)
> return 0;
> }
>
> +/* By default, the density test for autovectorization is disabled by
> + setting the minimum percentage to 100. */
> +
> +int
> +default_density_pct_threshold (void)
> +{
> + return 100;
> +}
> +
> +/* By default, the density size threshold for autovectorization is
> + meaningless since the density test is disabled. An unreasonably
> + large number is used to further inhibit the density test. */
> +
> +int
> +default_density_size_threshold (void)
> +{
> + return 1000000;
> +}
> +
> +/* By default, the density penalty for autovectorization is set to 10%. */
> +
> +int
> +default_density_penalty (void)
> +{
> + return 10;
> +}
> +
> /* Determine whether or not a pointer mode is valid. Assume defaults
> of ptr_mode or Pmode - can be overridden. */
> bool
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h (revision 188305)
> +++ gcc/targhooks.h (working copy)
> @@ -90,6 +90,9 @@ default_builtin_support_vector_misalignment (enum
> int, bool);
> extern enum machine_mode default_preferred_simd_mode (enum machine_mode
> mode);
> extern unsigned int default_autovectorize_vector_sizes (void);
> +extern int default_density_pct_threshold (void);
> +extern int default_density_size_threshold (void);
> +extern int default_density_penalty (void);
>
> /* These are here, and not in hooks.[ch], because not all users of
> hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS. */
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def (revision 188305)
> +++ gcc/target.def (working copy)
> @@ -1054,6 +1054,32 @@ DEFHOOK
> (const_tree mem_vectype, const_tree index_type, int scale),
> NULL)
>
> +/* Return the maximum density in percent for loop vectorization. */
> +DEFHOOK
> +(density_pct_threshold,
> +"",
> +int,
> +(void),
> +default_density_pct_threshold)
> +
> +/* Return the minimum size of a loop iteration for applying the density
> + test for loop vectorization. */
> +DEFHOOK
> +(density_size_threshold,
> +"",
> +int,
> +(void),
> +default_density_size_threshold)
> +
> +/* Return the penalty in percent for vectorizing a loop failing the
> + density test. */
> +DEFHOOK
> +(density_penalty,
> +"",
> +int,
> +(void),
> +default_density_penalty)
> +
> HOOK_VECTOR_END (vectorize)
>
> #undef HOOK_PREFIX
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c (revision 188305)
> +++ gcc/tree-vect-loop.c (working copy)
> @@ -2485,6 +2485,58 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi
> + peel_guard_costs;
> }
>
> +/* Add the inside-loop cost of STMT to either *REL_COST or *IRREL_COST,
> + depending on whether or not STMT will be vectorized. For vectorized
> + statements, the inside-loop cost is as already computed. For other
> + statements, assume a cost of one. */
> +
> +static void
> +accum_stmt_cost (gimple stmt, int *rel_cost, int *irrel_cost)
> +{
> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> + gimple pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
> + gimple_seq pattern_def_seq;
> +
> + /* If the statement is irrelevant, but it has a related pattern
> + statement that is relevant, process just the related statement.
> + If the statement is relevant and it has a related pattern
> + statement that is also relevant, process them both. */
> + if (!STMT_VINFO_RELEVANT_P (stmt_info)
> + && !STMT_VINFO_LIVE_P (stmt_info))
> + {
> + if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> + && pattern_stmt
> + && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
> + || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
> + accum_stmt_cost (pattern_stmt, rel_cost, irrel_cost);
> + else
> + (*irrel_cost)++;
> + }
> + else if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> + && pattern_stmt
> + && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
> + || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
> + accum_stmt_cost (pattern_stmt, rel_cost, irrel_cost);
> +
> + /* If we're looking at a pattern that has additional statements,
> + count them as well. */
> + if (is_pattern_stmt_p (stmt_info)
> + && (pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info)))
> + {
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_start (pattern_def_seq); !gsi_end_p (gsi); gsi_next
> (&gsi))
> + {
> + gimple pattern_def_stmt = gsi_stmt (gsi);
> + if (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_def_stmt))
> + || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_def_stmt)))
> + accum_stmt_cost (pattern_def_stmt, rel_cost, irrel_cost);
> + }
> + }
> +
> + /* Accumulate the inside-loop cost of this vectorizable statement. */
> + *rel_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
> +}
> +
> /* Function vect_estimate_min_profitable_iters
>
> Return the number of iterations required for the vector version of the
> @@ -2743,6 +2795,45 @@ vect_estimate_min_profitable_iters (loop_vec_info
> vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
> }
>
> + /* Test for likely overcommitment of vector hardware resources. If a
> + loop iteration is relatively large, and too large a percentage of
> + instructions in the loop are vectorized, the cost model may not
> + adequately reflect delays from unavailable vector resources.
> + Penalize vec_inside_cost for this case, using target-specific
> + parameters. */
> + if (targetm.vectorize.density_pct_threshold () < 100)
> + {
> + int rel_cost = 0, irrel_cost = 0;
> + int density_pct;
> +
> + for (i = 0; i < nbbs; i++)
> + {
> + basic_block bb = bbs[i];
> + gimple_stmt_iterator gsi;
> +
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gimple stmt = gsi_stmt (gsi);
> + accum_stmt_cost (stmt, &rel_cost, &irrel_cost);
> + }
> + }
> +
> + density_pct = (rel_cost * 100) / (rel_cost + irrel_cost);
> +
> + if (density_pct > targetm.vectorize.density_pct_threshold ()
> + && (rel_cost + irrel_cost
> + > targetm.vectorize.density_size_threshold ()))
> + {
> + int penalty = targetm.vectorize.density_penalty ();
> + vec_inside_cost = vec_inside_cost * (100 + penalty) / 100;
> + if (vect_print_dump_info (REPORT_DETAILS))
> + fprintf (vect_dump,
> + "density %d%%, cost %d exceeds threshold"
> + ", penalizing inside-loop cost by %d%%.",
> + density_pct, rel_cost + irrel_cost, penalty);
> + }
> + }
> +
> /* Calculate number of iterations required to make the vector version
> profitable, relative to the loop bodies only. The following condition
> must hold true:
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c (revision 188305)
> +++ gcc/config/rs6000/rs6000.c (working copy)
> @@ -1289,6 +1289,15 @@ static const struct attribute_spec rs6000_attribut
> #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
> rs6000_preferred_simd_mode
> +#undef TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD
> +#define TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD \
> + rs6000_density_pct_threshold
> +#undef TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD
> +#define TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD \
> + rs6000_density_size_threshold
> +#undef TARGET_VECTORIZE_DENSITY_PENALTY
> +#define TARGET_VECTORIZE_DENSITY_PENALTY \
> + rs6000_density_penalty
>
> #undef TARGET_INIT_BUILTINS
> #define TARGET_INIT_BUILTINS rs6000_init_builtins
> @@ -3421,13 +3430,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>
> case vec_perm:
> if (TARGET_VSX)
> - return 4;
> + return 3;
> else
> return 1;
>
> case vec_promote_demote:
> if (TARGET_VSX)
> - return 5;
> + return 4;
> else
> return 1;
>
> @@ -3551,6 +3560,30 @@ rs6000_preferred_simd_mode (enum machine_mode mode
> return word_mode;
> }
>
> +/* Implement targetm.vectorize.density_pct_threshold. */
> +
> +static int
> +rs6000_density_pct_threshold (void)
> +{
> + return 85;
> +}
> +
> +/* Implement targetm.vectorize.density_size_threshold. */
> +
> +static int
> +rs6000_density_size_threshold (void)
> +{
> + return 70;
> +}
> +
> +/* Implement targetm.vectorize.density_penalty. */
> +
> +static int
> +rs6000_density_penalty (void)
> +{
> + return 10;
> +}
> +
> /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
> library with vectorized intrinsics. */
>
>
>
>
--
Richard Guenther <[email protected]>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer