Jiufu Guo <guoji...@linux.ibm.com> writes: Hi Segher,
I updated the patch for option name at the end of this mail. Thanks for review in advance. > Jiufu Guo <guoji...@linux.ibm.com> writes: > >> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> >>> Hi! >>> >>> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote: >>>> --- gcc/common/config/rs6000/rs6000-common.c (revision 277765) >>>> +++ gcc/common/config/rs6000/rs6000-common.c (working copy) >>>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_ >>>> { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, >>>> /* Enable -fsched-pressure for first pass instruction scheduling. */ >>>> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, >>>> - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, >>>> + /* Enable -funroll-loops with -munroll-small-loops. */ >>>> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, >>>> + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, >>> >>> I guess the comment should say what we enable here more than the generic >>> code does. Something like >>> >>> /* Enable -funroll-loops at -O2 already. Also enable >>> -munroll-small-loops. */ >> >> updated to: >> /* Enable -munroll-only-small-loops with -funroll-loops to unroll small >> loops at -O2 and above by default. */ >> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, >> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 }, >> /* Disable -fweb and -frename-registers to avoid bad impacts. */ >> { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, >> { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, > /* Enable -munroll-only-small-loops with -funroll-loops to unroll small > loops at -O2 and above by default. */ > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, > { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, > /* -fweb and -frename-registers are useless in general, turn them off. */ > { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, > { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, > > A little better? > Updated patch is attached at the end of this mail, maybe it is easy for > review. :) > > Jiufu, > BR. > >> >> Thanks for more comments to make it better! >> >>> >>>> + /* Explicit funroll-loops turns -munroll-small-loops off. >>>> + Implicit funroll-loops does not turn fweb or frename-registers on. */ >>>> + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) >>>> + || (global_options_set.x_flag_unroll_all_loops >>>> + && flag_unroll_all_loops)) >>>> { >>>> + if (!global_options_set.x_unroll_small_loops) >>>> + unroll_small_loops = 0; >>>> + } >>>> + else >>>> + { >>>> if (!global_options_set.x_flag_web) >>>> + flag_web = 0; >>>> if (!global_options_set.x_flag_rename_registers) >>>> + flag_rename_registers = 0; >>>> } >>> >>> So unroll-small-loops should better be called unroll-only-small-loops? >> Thanks again. Right, unroll-only-small-loops is better. >>> >>> Why does explicit unroll-loops turns on web and rnreg? Why only explicit? >>> Isn't it good and/or bad in all the same cases, implicit and explicit? >> Good question! >> >> Turn off them by default, because they do not help too much for generic >> cases, and did not see performance gain on SPEC2017. And turning them >> off will help to consistent with previous -O2/-O3 which does not turn >> them on. This could avoid regressions in test cases. >> If do not turn them on with -funroll-loops, user may see performance >> difference on some cases. For example, in SPEC peak which option >> contains -funroll-loops, it may need to add -frename-registers manually >> for some benchmarks. >> >> Any sugguestions? Do you think it is a good idea to disable them by >> default, and let user to add them when they are helpful? e.g. add them >> for some benchmarks at `peak`. >> >>> >>>> +munroll-small-loops >>>> +Target Undocumented Var(unroll_small_loops) Init(0) Save >>>> +Use conservative small loop unrolling. >>> >>> Undocumented means undocumented, so you don't have a comment string in >>> here. But you can comment it: >>> >>> ; Use conservative small loop unrolling. >> Thanks again for you kindly review! >> >> Jiufu, >> >> BR. >>> >>> >>> Segher gcc/ 2019-11-07 Jiufu Guo <guoji...@linux.ibm.com> PR tree-optimization/88760 * gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option. * gcc/common/config/rs6000/rs6000-common.c (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]: Turn on -funroll-loops and -munroll-only-small-loops. [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers. * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS. Turn off -munroll-only-small-loops, turn on -fweb and -frename-registers for explicit -funroll-loops. (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook. (rs6000_loop_unroll_adjust): Define it. Use -munroll-only-small-loops. gcc.testsuite/ 2019-11-07 Jiufu Guo <guoji...@linux.ibm.com> PR tree-optimization/88760 * gcc.dg/pr59643.c: Update back to r277550. Index: gcc/common/config/rs6000/rs6000-common.c =================================================================== --- gcc/common/config/rs6000/rs6000-common.c (revision 277871) +++ gcc/common/config/rs6000/rs6000-common.c (working copy) @@ -35,7 +35,13 @@ static const struct default_options rs6000_option_ { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 }, /* Enable -fsched-pressure for first pass instruction scheduling. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, - { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 }, + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small + loops at -O2 and above by default. */ + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, + /* -fweb and -frename-registers are useless in general, turn them off. */ + { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 }, + { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277871) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_VECTORIZE_DESTROY_COST_DATA #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data +#undef TARGET_LOOP_UNROLL_ADJUST +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -4540,24 +4543,18 @@ rs6000_option_override_internal (bool global_init_ global_options.x_param_values, global_options_set.x_param_values); - /* unroll very small loops 2 time if no -funroll-loops. */ - if (!global_options_set.x_flag_unroll_loops - && !global_options_set.x_flag_unroll_all_loops) + /* Explicit -funroll-loops turns -munroll-only-small-loops off, and + turns fweb or frename-registers on. */ + if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops) + || (global_options_set.x_flag_unroll_all_loops + && flag_unroll_all_loops)) { - maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2, - global_options.x_param_values, - global_options_set.x_param_values); - - maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20, - global_options.x_param_values, - global_options_set.x_param_values); - - /* If fweb or frename-registers are not specificed in command-line, - do not turn them on implicitly. */ + if (!global_options_set.x_unroll_only_small_loops) + unroll_only_small_loops = 0; if (!global_options_set.x_flag_web) - global_options.x_flag_web = 0; + flag_web = 1; if (!global_options_set.x_flag_rename_registers) - global_options.x_flag_rename_registers = 0; + flag_rename_registers = 1; } /* If using typedef char *va_list, signal that @@ -5101,6 +5098,25 @@ rs6000_destroy_cost_data (void *data) free (data); } +/* Implement targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop) +{ + if (unroll_only_small_loops) + { + /* TODO: This is hardcoded to 10 right now. It can be refined, for + example we may want to unroll very small loops more times (4 perhaps). + We also should use a PARAM for this. */ + if (loop->ninsns <= 10) + return MIN (2, nunroll); + else + return 0; + } + + return nunroll; +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 277871) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -501,6 +501,10 @@ moptimize-swaps Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save Analyze and remove doubleword swaps from VSX computations. +munroll-only-small-loops +Target Undocumented Var(unroll_only_small_loops) Init(0) Save +; Use conservative small loop unrolling. + mpower9-misc Target Undocumented Report Mask(P9_MISC) Var(rs6000_isa_flags) Use certain scalar instructions added in ISA 3.0. Index: gcc/testsuite/gcc.dg/pr59643.c =================================================================== --- gcc/testsuite/gcc.dg/pr59643.c (revision 277871) +++ gcc/testsuite/gcc.dg/pr59643.c (working copy) @@ -1,9 +1,6 @@ /* PR tree-optimization/59643 */ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-pcom-details" } */ -/* { dg-additional-options "--param max-unrolled-insns=400" { target { powerpc*-*-* } } } */ -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for the - loop of this case. */ void foo (double *a, double *b, double *c, double d, double e, int n)