Hi, After hookizing MOVE_BY_PIECES_P and migrating tree-inline.c, we are left with only one user of MOVE_RATIO - deciding the maximum size of aggregate for SRA.
Past discussions have made it clear [1] that keeping this use of MOVE_RATIO is undesirable. Clearly it is now also misnamed. The previous iteration of this patch was rejected as too complicated. I went off and tried simplifying it to use MOVE_RATIO, but if we do that we end up breaking some interface boundaries between the driver and the backend. This patch partially hookizes MOVE_RATIO under the new name TARGET_MAX_SCALARIZATION_SIZE and uses it to set default values for two new parameters: sra-max-scalarization-size-Ospeed - The maximum size of aggregate to consider when compiling for speed sra-max-scalarization-size-Osize - The maximum size of aggregate to consider when compiling for size. We then modify SRA to use these parameters rather than MOVE_RATIO. Bootstrapped and regression tested for x86, arm and aarch64 with no issues. OK for trunk? [1]: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01997.html --- gcc/ 2014-09-25 James Greenhalgh <james.greenha...@arm.com> * doc/invoke.texi (sra-max-scalarization-size-Ospeed): Document. (sra-max-scalarization-size-Osize): Likewise. * doc/tm.texi.in (MOVE_RATIO): Reduce documentation to a stub, deprecate. (TARGET_MAX_SCALARIZATION_SIZE): Add hook. * doc/tm.texi: Regenerate. * defaults.h (MOVE_RATIO): Remove default implementation. (SET_RATIO): Add a default implementation if MOVE_RATIO is not defined. * params.def (sra-max-scalarization-size-Ospeed): New. (sra-max-scalarization-size-Osize): Likewise. * target.def (max_scalarization_size): New. * targhooks.c (default_max_scalarization_size): New. * targhooks.h (default_max_scalarization_size): New. * tree-sra.c (get_max_scalarization_size): New. (analyze_all_variable_accesses): Use it.
diff --git a/gcc/defaults.h b/gcc/defaults.h index c1776b0..f723e2c 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1191,18 +1191,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define BRANCH_COST(speed_p, predictable_p) 1 #endif -/* If a memory-to-memory move would take MOVE_RATIO or more simple - move-instruction sequences, we will do a movmem or libcall instead. */ - -#ifndef MOVE_RATIO -#if defined (HAVE_movmemqi) || defined (HAVE_movmemhi) || defined (HAVE_movmemsi) || defined (HAVE_movmemdi) || defined (HAVE_movmemti) -#define MOVE_RATIO(speed) 2 -#else -/* If we are optimizing for space (-Os), cut down the default move ratio. */ -#define MOVE_RATIO(speed) ((speed) ? 15 : 3) -#endif -#endif - /* If a clear memory operation would take CLEAR_RATIO or more simple move-instruction sequences, we will do a setmem or libcall instead. */ @@ -1219,7 +1207,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see SET_RATIO or more simple move-instruction sequences, we will do a movmem or libcall instead. */ #ifndef SET_RATIO +#ifdef MOVE_RATIO #define SET_RATIO(speed) MOVE_RATIO (speed) +#elif defined (HAVE_movmemqi) || defined (HAVE_movmemhi) || defined (HAVE_movmemsi) || defined (HAVE_movmemdi) || defined (HAVE_movmemti) +#define SET_RATIO(speed) 2 +#else +/* If we are optimizing for space (-Os), cut down the default move ratio. */ +#define SET_RATIO(speed) ((speed) ? 15 : 3) +#endif #endif /* Supply a default definition for FUNCTION_ARG_PADDING: diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index eae4ab1..c3e6eaa 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10301,6 +10301,16 @@ parameters only when their cumulative size is less or equal to @option{ipa-sra-ptr-growth-factor} times the size of the original pointer parameter. +@item sra-max-scalarization-size-Ospeed +@item sra-max-scalarization-size-Osize +The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to +replace scalar parts of aggregates with uses of independent scalar +variables. These parameters control the maximum size, in storage units, +of aggregate which will be considered for replacement when compiling for +speed +(@option{sra-max-scalarization-size-Ospeed}) or size +(@option{sra-max-scalarization-size-Osize}) respectively. + @item tm-max-aggregate-size When making copies of thread-local variables in a transaction, this parameter specifies the size in bytes after which variables are diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f59641a..b4061eb 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6098,20 +6098,25 @@ this macro is defined, it should produce a nonzero value when @end defmac @defmac MOVE_RATIO (@var{speed}) -The threshold of number of scalar memory-to-memory move insns, @emph{below} -which a sequence of insns should be generated instead of a -string move insn or a library call. Increasing the value will always -make code faster, but eventually incurs high cost in increased code size. +This macro is deprecated and is only used to guide the default behaviours +of @code{TARGET_MOVE_BY_PIECES_PROFITABLE_P} and +@code{TARGET_MAX_TOTAL_SCALARIZATION_SIZE}. New ports should implement +that hook in preference to this macro. +@end defmac -Note that on machines where the corresponding move insn is a -@code{define_expand} that emits a sequence of insns, this macro counts -the number of such sequences. +@deftypefn {Target Hook} {unsigned int} TARGET_MAX_SCALARIZATION_SIZE (bool @var{speed_p}) +This target hook is used by the Scalar Replacement of Aggregates passes +(SRA and IPA-SRA). This hook gives the maximimum size, in storage units, +of aggregate to consider for replacement. @var{speed_p} is true if we are +currently compiling for speed. -The parameter @var{speed} is true if the code is currently being -optimized for speed rather than size. +By default, the maximum scalarization size is determined by MOVE_RATIO, +if it is defined. Otherwise, a sensible default is chosen. -If you don't define this, a reasonable default is used. -@end defmac +Note that a user may choose to override this target hook with the +parameters @code{sra-max-scalarization-size-Ospeed} and +@code{sra-max-scalarization-size-Osize}. +@end deftypefn @defmac MOVE_BY_PIECES_P (@var{size}, @var{alignment}) A C expression used to implement the default behaviour of diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index d2a4386..bdd1ec4 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4581,21 +4581,14 @@ this macro is defined, it should produce a nonzero value when @end defmac @defmac MOVE_RATIO (@var{speed}) -The threshold of number of scalar memory-to-memory move insns, @emph{below} -which a sequence of insns should be generated instead of a -string move insn or a library call. Increasing the value will always -make code faster, but eventually incurs high cost in increased code size. - -Note that on machines where the corresponding move insn is a -@code{define_expand} that emits a sequence of insns, this macro counts -the number of such sequences. - -The parameter @var{speed} is true if the code is currently being -optimized for speed rather than size. - -If you don't define this, a reasonable default is used. +This macro is deprecated and is only used to guide the default behaviours +of @code{TARGET_MOVE_BY_PIECES_PROFITABLE_P} and +@code{TARGET_MAX_TOTAL_SCALARIZATION_SIZE}. New ports should implement +that hook in preference to this macro. @end defmac +@hook TARGET_MAX_SCALARIZATION_SIZE + @defmac MOVE_BY_PIECES_P (@var{size}, @var{alignment}) A C expression used to implement the default behaviour of @code{TARGET_MOVE_BY_PIECES_PROFITABLE_P}. New ports should implement diff --git a/gcc/params.def b/gcc/params.def index aefdd07..7b6c7e2 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -942,6 +942,18 @@ DEFPARAM (PARAM_TM_MAX_AGGREGATE_SIZE, "pairs", 9, 0, 0) +DEFPARAM (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED, + "sra-max-scalarization-size-Ospeed", + "Maximum size, in storage units, of an aggregate which should be " + "considered for scalarization when compiling for speed", + 0, 0, 0) + +DEFPARAM (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE, + "sra-max-scalarization-size-Osize", + "Maximum size, in storage units, of an aggregate which should be " + "considered for scalarization when compiling for size", + 0, 0, 0) + DEFPARAM (PARAM_IPA_CP_VALUE_LIST_SIZE, "ipa-cp-value-list-size", "Maximum size of a list of values associated with each parameter for " diff --git a/gcc/target.def b/gcc/target.def index 10f3b2e..4e19845 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -3049,6 +3049,24 @@ are the same as to this target hook.", int, (enum machine_mode mode, reg_class_t rclass, bool in), default_memory_move_cost) +/* Return the maximum size in bytes of aggregate which will be considered + for replacement by SRA/IP-SRA. */ +DEFHOOK +(max_scalarization_size, + "This target hook is used by the Scalar Replacement of Aggregates passes\n\ +(SRA and IPA-SRA). This hook gives the maximimum size, in storage units,\n\ +of aggregate to consider for replacement. @var{speed_p} is true if we are\n\ +currently compiling for speed.\n\ +\n\ +By default, the maximum scalarization size is determined by MOVE_RATIO,\n\ +if it is defined. Otherwise, a sensible default is chosen.\n\ +\n\ +Note that a user may choose to override this target hook with the\n\ +parameters @code{sra-max-scalarization-size-Ospeed} and\n\ +@code{sra-max-scalarization-size-Osize}.", + unsigned int, (bool speed_p), + default_max_scalarization_size) + DEFHOOK (move_by_pieces_profitable_p, "GCC will attempt several strategies when asked to copy between\n\ diff --git a/gcc/targhooks.c b/gcc/targhooks.c index eb0a4cd..abc94ff 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1421,6 +1421,15 @@ get_move_ratio (bool speed_p ATTRIBUTE_UNUSED) return move_ratio; } +/* Return the maximum size, in storage units, of aggregate + which will be considered for replacement by SRA/IP-SRA. */ + +unsigned int +default_max_scalarization_size (bool speed_p ATTRIBUTE_UNUSED) +{ + return get_move_ratio (speed_p) * MOVE_MAX_PIECES; +} + /* The threshold of move insns below which the movmem optab is expanded or a call to memcpy is emitted. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index f76ad31..35467f8 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -181,6 +181,7 @@ extern int default_memory_move_cost (enum machine_mode, reg_class_t, bool); extern int default_register_move_cost (enum machine_mode, reg_class_t, reg_class_t); +extern unsigned int default_max_scalarization_size (bool size_p); extern bool default_move_by_pieces_profitable_p (unsigned int, unsigned int, bool); extern unsigned int default_estimate_block_copy_ninsns (HOST_WIDE_INT, bool); diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 8259dba..c611d29 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2482,6 +2482,25 @@ propagate_all_subaccesses (void) } } +/* Return the appropriate parameter value giving the maximum size of + aggregate (in storage units) to be considered for scalerization. + SPEED_P, which is true if we are currently optimizing for speed + rather than size. */ + +unsigned int +get_max_scalarization_size (bool speed_p) +{ + unsigned param_max_scalarization_size + = speed_p + ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED) + : PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE); + + if (!param_max_scalarization_size) + return targetm.max_scalarization_size (speed_p); + + return param_max_scalarization_size; +} + /* Go through all accesses collected throughout the (intraprocedural) analysis stage, exclude overlapping ones, identify representatives and build trees out of them, making decisions about scalarization on the way. Return true @@ -2493,10 +2512,10 @@ analyze_all_variable_accesses (void) int res = 0; bitmap tmp = BITMAP_ALLOC (NULL); bitmap_iterator bi; - unsigned i, max_total_scalarization_size; - - max_total_scalarization_size = UNITS_PER_WORD * BITS_PER_UNIT - * MOVE_RATIO (optimize_function_for_speed_p (cfun)); + unsigned i; + unsigned int max_scalarization_size + = get_max_scalarization_size (optimize_function_for_size_p (cfun)) + * BITS_PER_UNIT; EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi) if (bitmap_bit_p (should_scalarize_away_bitmap, i) @@ -2508,7 +2527,7 @@ analyze_all_variable_accesses (void) && type_consists_of_records_p (TREE_TYPE (var))) { if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) - <= max_total_scalarization_size) + <= max_scalarization_size) { completely_scalarize_var (var); if (dump_file && (dump_flags & TDF_DETAILS))