On Tue, 29 Oct 2019, Jiufu Guo wrote:

> Jiufu Guo <guoji...@linux.ibm.com> writes:
> 
> > Hi,
> >
> > In previous patch r277501, which is changing PARAM_MAX_UNROLL_TIMES
> > and PARAM_MAX_UNROLLED_INSNS values during option overriding, while
> > it would better to use target loop unroll adjust hook. The hook can
> > also help to do further target related hueristic adjust.
> > This patch is adding target loop unroll adjust hook for rs6000 to
> > impliment previous behavior.
> >
> > Bootstrapped and regtested on powerpc64le. Is this ok for trunk?
> >
> > A combined patch is listed at the end of this mail for this and r277501.
> > If you want to review it as a whole, it can be referenced.
> >
> > Jiufu Guo
> > BR
> >
> >
> > gcc/
> > 2019-10-29  Jiufu Guo  <guoji...@linux.ibm.com>         
> >
> >     PR tree-optimization/88760
> >     * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
> >     changes to PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
> >     (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
> >     (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. Unrolling
> >     small loop 2 times if funroll-loops enabled implicitly.
> >
> >     
> > gcc.testsuite/
> > 2019-10-29  Jiufu Guo  <guoji...@linux.ibm.com>
> >
> >     PR tree-optimization/88760
> >     * gcc.dg/pr59643.c: Update back to r277550.
> >
> > Index: gcc/config/rs6000/rs6000.c
> > ===================================================================
> > --- gcc/config/rs6000/rs6000.c      (revision 277550)
> > +++ gcc/config/rs6000/rs6000.c      (working copy)
> > @@ -1428,6 +1428,9 @@ static const struct attribute_spec rs6000_attribut
> >  #undef TARGET_VECTORIZE_DESTROY_COST_DATA
> >  #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
> >  
> > +#undef TARGET_LOOP_UNROLL_ADJUST
> > +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> > +
> >  #undef TARGET_INIT_BUILTINS
> >  #define TARGET_INIT_BUILTINS rs6000_init_builtins
> >  #undef TARGET_BUILTIN_DECL
> > @@ -4540,20 +4543,11 @@ rs6000_option_override_internal (bool global_init_
> >                          global_options.x_param_values,
> >                          global_options_set.x_param_values);
> >  
> > -      /* unroll very small loops 2 time if no -funroll-loops.  */
> > +      /* If funroll-loops is implicitly enabled, do not turn fweb or
> > +    frename-registers on implicitly.  */
> >        if (!global_options_set.x_flag_unroll_loops
> >       && !global_options_set.x_flag_unroll_all_loops)
> >     {
> > -     maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> > -                            global_options.x_param_values,
> > -                            global_options_set.x_param_values);
> > -
> > -     maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> > -                            global_options.x_param_values,
> > -                            global_options_set.x_param_values);
> > -
> > -     /* If fweb or frename-registers are not specificed in command-line,
> > -        do not turn them on implicitly.  */
> >       if (!global_options_set.x_flag_web)
> >         global_options.x_flag_web = 0;
> >       if (!global_options_set.x_flag_rename_registers)
> > @@ -5101,6 +5095,29 @@ rs6000_destroy_cost_data (void *data)
> >    free (data);
> >  }
> >  
> > +/* This target hook implementation for TARGET_LOOP_UNROLL_ADJUST calculates
> > +   a new heristic number struct loop *loop should be unrolled.  */
> > +
> > +static unsigned
> > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
> > +{
> > +  /* If funroll-loops was enabled implicitly, like at -O2, we only unroll
> > +     those small loops with small unroll factor.  */
> > +  if (!global_options_set.x_flag_unroll_loops
> > +      && !global_options_set.x_flag_unroll_all_loops)
> Here, still checking global option setting: x_flag_unroll_loops to avoid
> changing the behavior of -funroll-loops which come from command line.
> 
> And then only enhancing the behavior for implicit funroll-loops (by -O2/3
> level).
> 
> We would remove this checking after heuristic refine when it can benefits
> explicitly -funroll-loops.
>
> > +    {
> > +      /* If the loop contains few insns, treated it as small loops.
> > +    TODO: Uing 10 hard code for now.  Continue to refine, For example,
> > +    if loop constians only 1-2 insns, we may unroll more times(4).
> > +    And we may use PARAM to control kinds of loop size.  */
> > +      if (loop->ninsns <= 10)
> > +   return 2;
> > +      else
> > +   return 0;
> > +    }
> > +  return nunroll;
> > +}
> > +
> For this patch at -flto, loop-unrolling is affected by command line
> besides optimize level per function. 
> -flto -c t.c -O2 -funroll-loops
> -flto  t.o -O2 or -flto t.o,
> this linking will not be treated as explicit -funroll-loops. 

That's because you check global_options_set - you should never do this
in hooks invoked by optimization passes.  IIRC some discussion in
a thread elsewhere that even with -flto you can manage to see whether
flag_unroll_loop was enabled explicitely or implicitely - not sure
if it was for power, s390 or aarch64.

Richard.
 
> >  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to 
> > a
> >     library with vectorized intrinsics.  */
> >  
> > Index: gcc/testsuite/gcc.dg/pr59643.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/pr59643.c  (revision 277550)
> > +++ gcc/testsuite/gcc.dg/pr59643.c  (working copy)
> > @@ -1,9 +1,6 @@
> >  /* PR tree-optimization/59643 */
> >  /* { dg-do compile } */
> >  /* { dg-options "-O3 -fdump-tree-pcom-details" } */
> > -/* { dg-additional-options "--param max-unrolled-insns=400" { target { 
> > powerpc*-*-* } } } */
> > -/* Implicit threashold of max-unrolled-insn on ppc at O3 is too small for 
> > the
> > -   loop of this case.  */
> >  
> >  void
> >  foo (double *a, double *b, double *c, double d, double e, int n)
> >
> >
> > -------- combined patch---------
> > gcc/ChangeLog
> > 2019-10-25  Jiufu Guo  <guoji...@linux.ibm.com>         
> >
> >     PR tree-optimization/88760
> >     * config/rs6000/rs6000-common.c (rs6000_option_optimization_table):
> >     Enable -funroll-loops for -O2 and above.
> >     * config/rs6000/rs6000.c (rs6000_option_override_internal): Avoid turn
> >     on web and rnreg implicitly.
> >     (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
> >     (rs6000_loop_unroll_adjust): New hook for loop unroll adjust. Unrolling
> >     small loop 2 times if funroll-loops enabled implicitly.
> >
> >
> > gcc/testsuite/ChangeLog
> > 2019-10-25  Jiufu Guo  <guoji...@linux.ibm.com>
> >
> >     PR tree-optimization/88760
> >     * gcc.target/powerpc/small-loop-unroll.c: New test.
> >     * c-c++-common/tsan/thread_leak2.c: Update test.
> >     * gcc.target/powerpc/loop_align.c: Update test.
> >     * gcc.target/powerpc/ppc-fma-1.c: Update test.
> >     * gcc.target/powerpc/ppc-fma-2.c: Update test.
> >     * gcc.target/powerpc/ppc-fma-3.c: Update test.
> >     * gcc.target/powerpc/ppc-fma-4.c: Update test.
> >     * gcc.target/powerpc/pr78604.c: Update test.
> >
> >
> > diff --git a/gcc/common/config/rs6000/rs6000-common.c 
> > b/gcc/common/config/rs6000/rs6000-common.c
> > index 4b0c205..b947196 100644
> > --- a/gcc/common/config/rs6000/rs6000-common.c
> > +++ b/gcc/common/config/rs6000/rs6000-common.c
> > @@ -35,6 +35,7 @@ static const struct default_options 
> > rs6000_option_optimization_table[] =
> >      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
> >      /* Enable -fsched-pressure for first pass instruction scheduling.  */
> >      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> > +    { OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
> >      { OPT_LEVELS_NONE, 0, NULL, 0 }
> >    };
> >  
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 1399221..28ffa15 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -1428,6 +1428,9 @@ static const struct attribute_spec 
> > rs6000_attribute_table[] =
> >  #undef TARGET_VECTORIZE_DESTROY_COST_DATA
> >  #define TARGET_VECTORIZE_DESTROY_COST_DATA rs6000_destroy_cost_data
> >  
> > +#undef TARGET_LOOP_UNROLL_ADJUST
> > +#define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
> > +
> >  #undef TARGET_INIT_BUILTINS
> >  #define TARGET_INIT_BUILTINS rs6000_init_builtins
> >  #undef TARGET_BUILTIN_DECL
> > @@ -4540,6 +4543,17 @@ rs6000_option_override_internal (bool global_init_p)
> >                          global_options.x_param_values,
> >                          global_options_set.x_param_values);
> >  
> > +      /* If funroll-loops is implicitly enabled, do not turn fweb or
> > +    frename-registers on implicitly.  */
> > +      if (!global_options_set.x_flag_unroll_loops
> > +     && !global_options_set.x_flag_unroll_all_loops)
> > +   {
> > +     if (!global_options_set.x_flag_web)
> > +       global_options.x_flag_web = 0;
> > +     if (!global_options_set.x_flag_rename_registers)
> > +       global_options.x_flag_rename_registers = 0;
> > +   }
> > +
> >        /* If using typedef char *va_list, signal that
> >      __builtin_va_start (&ap, 0) can be optimized to
> >      ap = __builtin_next_arg (0).  */
> > @@ -5081,6 +5095,29 @@ rs6000_destroy_cost_data (void *data)
> >    free (data);
> >  }
> >  
> > +/* This target hook implementation for TARGET_LOOP_UNROLL_ADJUST calculates
> > +   a new heristic number struct loop *loop should be unrolled.  */
> > +
> > +static unsigned
> > +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
> > +{
> > +  /* If funroll-loops was enabled implicitly, like at -O2, we only unroll
> > +     those small loops with small unroll factor.  */
> > +  if (!global_options_set.x_flag_unroll_loops
> > +      && !global_options_set.x_flag_unroll_all_loops)
> > +    {
> > +      /* If the loop contains few insns, treated it as small loops.
> > +    TODO: Uing 10 hard code for now.  Continue to refine, For example,
> > +    if loop constians only 1-2 insns, we may unroll more times(4).
> > +    And we may use PARAM to control kinds of loop size.  */
> > +      if (loop->ninsns <= 10)
> > +   return 2;
> > +      else
> > +   return 0;
> > +    }
> > +  return nunroll;
> > +}
> > +
> >  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to 
> > a
> >     library with vectorized intrinsics.  */
> >  
> > diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c 
> > b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> > index c9b8046..082f2aa 100644
> > --- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> > +++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> > @@ -1,5 +1,9 @@
> >  /* { dg-shouldfail "tsan" } */
> >  
> > +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc*-*-* } } 
> > } */
> > +/* -fno-unroll-loops help to avoid ThreadSanitizer reporting multi-times
> > +   message for pthread_create at difference calling addresses.  */
> > +
> >  #include <pthread.h>
> >  #include <unistd.h>
> >  
> > diff --git a/gcc/testsuite/gcc.target/powerpc/loop_align.c 
> > b/gcc/testsuite/gcc.target/powerpc/loop_align.c
> > index ebe3782..ef67f77 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/loop_align.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile { target { powerpc*-*-* } } } */
> >  /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
> > -/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16" } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 
> > -fno-unroll-loops" } */
> >  /* { dg-final { scan-assembler ".p2align 5" } } */
> >  
> >  void f(double *a, double *b, double *c, unsigned long n) {
> > diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c 
> > b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> > index b4945e6..2a5b92c 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> > @@ -1,7 +1,7 @@
> >  /* { dg-do compile { target { powerpc*-*-* } } } */
> >  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> >  /* { dg-require-effective-target powerpc_vsx_ok } */
> > -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math" } 
> > */
> > +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
> > -fno-unroll-loops" } */
> >  /* { dg-final { scan-assembler-times "xvmadd" 4 } } */
> >  /* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 2 } } */
> >  /* { dg-final { scan-assembler-times "fmadds" 2 } } */
> > diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c 
> > b/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c
> > index 5ed630a..bf2c67f 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c
> > @@ -1,7 +1,7 @@
> >  /* { dg-do compile { target { powerpc*-*-* } } } */
> >  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> >  /* { dg-require-effective-target powerpc_vsx_ok } */
> > -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
> > -ffp-contract=off" } */
> > +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
> > -ffp-contract=off -fno-unroll-loops" } */
> >  /* { dg-final { scan-assembler-times "xvmadd" 2 } } */
> >  /* { dg-final { scan-assembler-times "xsmadd\|fmadd\ " 1 } } */
> >  /* { dg-final { scan-assembler-times "fmadds" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c 
> > b/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c
> > index ef252b3..8608116 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c
> > @@ -2,7 +2,7 @@
> >  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> >  /* { dg-require-effective-target powerpc_altivec_ok } */
> >  /* { dg-require-effective-target powerpc_fprs } */
> > -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
> > -ffast-math" } */
> > +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
> > -ffast-math -fno-unroll-loops" } */
> >  /* { dg-final { scan-assembler-times "vmaddfp" 2 } } */
> >  /* { dg-final { scan-assembler-times "fmadd " 2 } } */
> >  /* { dg-final { scan-assembler-times "fmadds" 2 } } */
> > diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c 
> > b/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c
> > index c2eaf1a..291c2ee 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c
> > @@ -2,7 +2,7 @@
> >  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> >  /* { dg-require-effective-target powerpc_altivec_ok } */
> >  /* { dg-require-effective-target powerpc_fprs } */
> > -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
> > -ffast-math -ffp-contract=off" } */
> > +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power6 -maltivec 
> > -ffast-math -ffp-contract=off -fno-unroll-loops" } */
> >  /* { dg-final { scan-assembler-times "vmaddfp" 1 } } */
> >  /* { dg-final { scan-assembler-times "fmadd " 1 } } */
> >  /* { dg-final { scan-assembler-times "fmadds" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr78604.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr78604.c
> > index 76d8945..35bfdb3 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr78604.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr78604.c
> > @@ -1,7 +1,7 @@
> >  /* { dg-do compile { target { powerpc*-*-* } } } */
> >  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> >  /* { dg-require-effective-target powerpc_p8vector_ok } */
> > -/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize 
> > -fdump-tree-vect-details" } */
> > +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize 
> > -fdump-tree-vect-details -fno-unroll-loops" } */
> >  
> >  #ifndef SIZE
> >  #define SIZE 1024
> > diff --git a/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c 
> > b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
> > new file mode 100644
> > index 0000000..fec5ae9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-loop2_unroll" } */
> > +
> > +void __attribute__ ((noinline)) foo(int n, int *arr)
> > +{
> > +  int i;
> > +  for (i = 0; i < n; i++)
> > +    arr[i] = arr[i] - 10;
> > +}
> > +/* { dg-final { scan-rtl-dump-times "Unrolled loop 1 times" 1 
> > "loop2_unroll" } } */
> > +/* { dg-final { scan-assembler-times {\mlwz\M} 3 } } */
> > +/* { dg-final { scan-assembler-times {\mstw\M} 3 } } */
> > +
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to