This patch adds a heuristic to the vectorizer when estimating the minimum profitable number of iterations. The heuristic is target-dependent, and is currently disabled for all targets except PowerPC. However, the intent is to make it general enough to be useful for other targets that want to opt in.
A previous patch addressed some PowerPC SPEC degradations by modifying the vector cost model values for vec_perm and vec_promote_demote. The values were set a little higher than their natural values because the natural values were not sufficient to prevent a poor vectorization choice. However, this is not the right long-term solution, since it can unnecessarily constrain other vectorization choices involving permute instructions. Analysis of the badly vectorized loop (in sphinx3) showed that the problem was overcommitment of vector resources -- too many vector instructions issued without enough non-vector instructions available to cover the delays. The vector cost model assumes that instructions always have a constant cost, and doesn't have a way of judging this kind of "density" of vector instructions. The present patch adds a heuristic to recognize when a loop is likely to overcommit resources, and adds a small penalty to the inside-loop cost to account for the expected stalls. The heuristic is parameterized with three target-specific values: * Density threshold: The heuristic will apply only when the percentage of inside-loop cost attributable to vectorized instructions exceeds this value. * Size threshold: The heuristic will apply only when the inside-loop cost exceeds this value. * Penalty: The inside-loop cost will be increased by this percentage value when the heuristic applies. Thus only reasonably large loop bodies that are mostly vectorized instructions will be affected. By applying only a small percentage bump to the inside-loop cost, the heuristic will only turn off vectorization for loops that were considered "barely profitable" to begin with (such as the sphinx3 loop). So the heuristic is quite conservative and should not affect the vast majority of vectorization decisions. Together with the new heuristic, this patch reduces the vec_perm and vec_promote_demote costs for PowerPC to their natural values. I've regstrapped this with no regressions on powerpc64-unknown-linux-gnu and verified that no performance regressions occur on SPEC cpu2006. Is this ok for trunk? Thanks, Bill 2012-06-08 Bill Schmidt <wschm...@linux.ibm.com> * doc/tm.texi.in: Add vectorization density hooks. * doc/tm.texi: Regenerate. * targhooks.c (default_density_pct_threshold): New. (default_density_size_threshold): New. (default_density_penalty): New. * targhooks.h: New decls for new targhooks.c functions. * target.def (density_pct_threshold): New DEF_HOOK. (density_size_threshold): Likewise. (density_penalty): Likewise. * tree-vect-loop.c (accum_stmt_cost): New. (vect_estimate_min_profitable_iters): Perform density test. * config/rs6000/rs6000.c (TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD): New macro definition. (TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD): Likewise. (TARGET_VECTORIZE_DENSITY_PENALTY): Likewise. (rs6000_builtin_vectorization_cost): Reduce costs of vec_perm and vec_promote_demote to correct values. (rs6000_density_pct_threshold): New. (rs6000_density_size_threshold): New. (rs6000_density_penalty): New. Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 188305) +++ gcc/doc/tm.texi (working copy) @@ -5798,6 +5798,27 @@ The default is @code{NULL_TREE} which means to not loads. @end deftypefn +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD (void) +This hook should return the maximum density, expressed in percent, for +which autovectorization of loops with large bodies should be constrained. +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}. The default +is to return 100, which disables the density test. +@end deftypefn + +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD (void) +This hook should return the minimum estimated size of a vectorized +loop body for which the density test should apply. See also +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}. The default is set +to the unreasonable value of 1000000, which effectively disables +the density test. +@end deftypefn + +@deftypefn {Target Hook} int TARGET_VECTORIZE_DENSITY_PENALTY (void) +This hook should return the penalty, expressed in percent, to be applied +to the inside-of-loop vectorization costs for a loop failing the density +test. The default is 10. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 188305) +++ gcc/doc/tm.texi.in (working copy) @@ -5730,6 +5730,27 @@ The default is @code{NULL_TREE} which means to not loads. @end deftypefn +@hook TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD +This hook should return the maximum density, expressed in percent, for +which autovectorization of loops with large bodies should be constrained. +See also @code{TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD}. The default +is to return 100, which disables the density test. +@end deftypefn + +@hook TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD +This hook should return the minimum estimated size of a vectorized +loop body for which the density test should apply. See also +@code{TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD}. The default is set +to the unreasonable value of 1000000, which effectively disables +the density test. +@end deftypefn + +@hook TARGET_VECTORIZE_DENSITY_PENALTY +This hook should return the penalty, expressed in percent, to be applied +to the inside-of-loop vectorization costs for a loop failing the density +test. The default is 10. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses Index: gcc/targhooks.c =================================================================== --- gcc/targhooks.c (revision 188305) +++ gcc/targhooks.c (working copy) @@ -990,6 +990,33 @@ default_autovectorize_vector_sizes (void) return 0; } +/* By default, the density test for autovectorization is disabled by + setting the minimum percentage to 100. */ + +int +default_density_pct_threshold (void) +{ + return 100; +} + +/* By default, the density size threshold for autovectorization is + meaningless since the density test is disabled. An unreasonably + large number is used to further inhibit the density test. */ + +int +default_density_size_threshold (void) +{ + return 1000000; +} + +/* By default, the density penalty for autovectorization is set to 10%. */ + +int +default_density_penalty (void) +{ + return 10; +} + /* Determine whether or not a pointer mode is valid. Assume defaults of ptr_mode or Pmode - can be overridden. */ bool Index: gcc/targhooks.h =================================================================== --- gcc/targhooks.h (revision 188305) +++ gcc/targhooks.h (working copy) @@ -90,6 +90,9 @@ default_builtin_support_vector_misalignment (enum int, bool); extern enum machine_mode default_preferred_simd_mode (enum machine_mode mode); extern unsigned int default_autovectorize_vector_sizes (void); +extern int default_density_pct_threshold (void); +extern int default_density_size_threshold (void); +extern int default_density_penalty (void); /* These are here, and not in hooks.[ch], because not all users of hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS. */ Index: gcc/target.def =================================================================== --- gcc/target.def (revision 188305) +++ gcc/target.def (working copy) @@ -1054,6 +1054,32 @@ DEFHOOK (const_tree mem_vectype, const_tree index_type, int scale), NULL) +/* Return the maximum density in percent for loop vectorization. */ +DEFHOOK +(density_pct_threshold, +"", +int, +(void), +default_density_pct_threshold) + +/* Return the minimum size of a loop iteration for applying the density + test for loop vectorization. */ +DEFHOOK +(density_size_threshold, +"", +int, +(void), +default_density_size_threshold) + +/* Return the penalty in percent for vectorizing a loop failing the + density test. */ +DEFHOOK +(density_penalty, +"", +int, +(void), +default_density_penalty) + HOOK_VECTOR_END (vectorize) #undef HOOK_PREFIX Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 188305) +++ gcc/tree-vect-loop.c (working copy) @@ -2485,6 +2485,58 @@ vect_get_known_peeling_cost (loop_vec_info loop_vi + peel_guard_costs; } +/* Add the inside-loop cost of STMT to either *REL_COST or *IRREL_COST, + depending on whether or not STMT will be vectorized. For vectorized + statements, the inside-loop cost is as already computed. For other + statements, assume a cost of one. */ + +static void +accum_stmt_cost (gimple stmt, int *rel_cost, int *irrel_cost) +{ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + gimple pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info); + gimple_seq pattern_def_seq; + + /* If the statement is irrelevant, but it has a related pattern + statement that is relevant, process just the related statement. + If the statement is relevant and it has a related pattern + statement that is also relevant, process them both. */ + if (!STMT_VINFO_RELEVANT_P (stmt_info) + && !STMT_VINFO_LIVE_P (stmt_info)) + { + if (STMT_VINFO_IN_PATTERN_P (stmt_info) + && pattern_stmt + && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt)) + || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt)))) + accum_stmt_cost (pattern_stmt, rel_cost, irrel_cost); + else + (*irrel_cost)++; + } + else if (STMT_VINFO_IN_PATTERN_P (stmt_info) + && pattern_stmt + && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt)) + || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt)))) + accum_stmt_cost (pattern_stmt, rel_cost, irrel_cost); + + /* If we're looking at a pattern that has additional statements, + count them as well. */ + if (is_pattern_stmt_p (stmt_info) + && (pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info))) + { + gimple_stmt_iterator gsi; + for (gsi = gsi_start (pattern_def_seq); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple pattern_def_stmt = gsi_stmt (gsi); + if (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_def_stmt)) + || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_def_stmt))) + accum_stmt_cost (pattern_def_stmt, rel_cost, irrel_cost); + } + } + + /* Accumulate the inside-loop cost of this vectorizable statement. */ + *rel_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info); +} + /* Function vect_estimate_min_profitable_iters Return the number of iterations required for the vector version of the @@ -2743,6 +2795,45 @@ vect_estimate_min_profitable_iters (loop_vec_info vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance); } + /* Test for likely overcommitment of vector hardware resources. If a + loop iteration is relatively large, and too large a percentage of + instructions in the loop are vectorized, the cost model may not + adequately reflect delays from unavailable vector resources. + Penalize vec_inside_cost for this case, using target-specific + parameters. */ + if (targetm.vectorize.density_pct_threshold () < 100) + { + int rel_cost = 0, irrel_cost = 0; + int density_pct; + + for (i = 0; i < nbbs; i++) + { + basic_block bb = bbs[i]; + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + accum_stmt_cost (stmt, &rel_cost, &irrel_cost); + } + } + + density_pct = (rel_cost * 100) / (rel_cost + irrel_cost); + + if (density_pct > targetm.vectorize.density_pct_threshold () + && (rel_cost + irrel_cost + > targetm.vectorize.density_size_threshold ())) + { + int penalty = targetm.vectorize.density_penalty (); + vec_inside_cost = vec_inside_cost * (100 + penalty) / 100; + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, + "density %d%%, cost %d exceeds threshold" + ", penalizing inside-loop cost by %d%%.", + density_pct, rel_cost + irrel_cost, penalty); + } + } + /* Calculate number of iterations required to make the vector version profitable, relative to the loop bodies only. The following condition must hold true: Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 188305) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1289,6 +1289,15 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \ rs6000_preferred_simd_mode +#undef TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD +#define TARGET_VECTORIZE_DENSITY_PCT_THRESHOLD \ + rs6000_density_pct_threshold +#undef TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD +#define TARGET_VECTORIZE_DENSITY_SIZE_THRESHOLD \ + rs6000_density_size_threshold +#undef TARGET_VECTORIZE_DENSITY_PENALTY +#define TARGET_VECTORIZE_DENSITY_PENALTY \ + rs6000_density_penalty #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins @@ -3421,13 +3430,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ case vec_perm: if (TARGET_VSX) - return 4; + return 3; else return 1; case vec_promote_demote: if (TARGET_VSX) - return 5; + return 4; else return 1; @@ -3551,6 +3560,30 @@ rs6000_preferred_simd_mode (enum machine_mode mode return word_mode; } +/* Implement targetm.vectorize.density_pct_threshold. */ + +static int +rs6000_density_pct_threshold (void) +{ + return 85; +} + +/* Implement targetm.vectorize.density_size_threshold. */ + +static int +rs6000_density_size_threshold (void) +{ + return 70; +} + +/* Implement targetm.vectorize.density_penalty. */ + +static int +rs6000_density_penalty (void) +{ + return 10; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */