On Fri, 2012-02-10 at 07:31 -0600, William J. Schmidt wrote:
> Richard, thanks.  I can answer most of your questions, but for the last
> one I will have to ask Ira to weigh in.
> 
> On Fri, 2012-02-10 at 13:06 +0100, Richard Guenther wrote:
> > On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt
> > <wschm...@linux.vnet.ibm.com> wrote:
> > > Following is a revision of yesterday's PR50031 patch submission,
> > > modified per Richard's comments.  Bootstrapped and tested with no
> > > regressions on powerpc64-linux.  I've confirmed the same performance
> > > improvements in SPEC.  OK for trunk?
> > 
> > Some more questions - maybe Jakub can clarify as well given he worked
> > on patterns recently ...
> > 
> > > Thanks,
> > > Bill
> > >
> > >
> > > 2012-02-09  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> > >            Ira Rosen  <i...@il.ibm.com>
> > >
> > >        PR tree-optimization/50031
> > >        * targhooks.c (default_builtin_vectorization_cost): Handle
> > >        vec_promote_demote.
> > >        * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
> > >        * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
> > >        all types of reduction and pattern statements.
> > >        (vect_estimate_min_profitable_iters): Likewise.
> > >        * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New 
> > > function.
> > >        (vect_get_load_cost): Use vec_perm for permutations; add dump logic
> > >        for explicit realigns.
> > >        (vectorizable_conversion): Call vect_model_promotion_demotion_cost.
> > >        * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
> > >        vec_promote_demote.
> > >        * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> > >        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
> > > Update
> > >        vec_perm for VSX and handle vec_promote_demote.
> > >
> > >
> > > Index: gcc/targhooks.c
> > > ===================================================================
> > > --- gcc/targhooks.c     (revision 183944)
> > > +++ gcc/targhooks.c     (working copy)
> > > @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost
> > >       case scalar_to_vec:
> > >       case cond_branch_not_taken:
> > >       case vec_perm:
> > > +      case vec_promote_demote:
> > >         return 1;
> > >
> > >       case unaligned_load:
> > > Index: gcc/target.h
> > > ===================================================================
> > > --- gcc/target.h        (revision 183944)
> > > +++ gcc/target.h        (working copy)
> > > @@ -145,7 +145,8 @@ enum vect_cost_for_stmt
> > >   scalar_to_vec,
> > >   cond_branch_not_taken,
> > >   cond_branch_taken,
> > > -  vec_perm
> > > +  vec_perm,
> > > +  vec_promote_demote
> > >  };
> > >
> > >  /* The target structure.  This holds all the backend hooks.  */
> > > Index: gcc/tree-vect-loop.c
> > > ===================================================================
> > > --- gcc/tree-vect-loop.c        (revision 183944)
> > > +++ gcc/tree-vect-loop.c        (working copy)
> > > @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf
> > >           if (stmt_info
> > >               && !STMT_VINFO_RELEVANT_P (stmt_info)
> > >               && (!STMT_VINFO_LIVE_P (stmt_info)
> > > -                  || STMT_VINFO_DEF_TYPE (stmt_info) != 
> > > vect_reduction_def))
> > > +                  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE 
> > > (stmt_info)))
> > > +             && !STMT_VINFO_IN_PATTERN_P (stmt_info))
> > >             continue;
> > 
> > Why would we exclude !relevant stmts when they are part of a pattern?
> > We are looking at a scalar iteration cost, so all stmts that are not "dead"
> > count, no?
> 
> As I understand it, we're at a point where a statement replacing the
> pattern exists but has not yet been inserted in the code stream.  All of
> the statements in the pattern are marked irrelevant, but the related
> statement of the main (last) statement of the pattern is relevant.  Thus
> we need to allow the main statement through this check so the
> replacement statement can be found and counted.
> 
> > 
> > >
> > >           if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
> > > @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info
> > >        {
> > >          gimple stmt = gsi_stmt (si);
> > >          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> > > +
> > > +         /* Translate the last statement in a pattern into the
> > > +            related replacement statement.  */
> > > +         if (STMT_VINFO_IN_PATTERN_P (stmt_info))
> > > +           {
> > > +             stmt = STMT_VINFO_RELATED_STMT (stmt_info);
> > > +             stmt_info = vinfo_for_stmt (stmt);
> > > +           }
> > 
> > So here we are tanslating stmt to the "main" scalar pattern stmt - and thus
> > count it as many times as we have stmts in that pattern?  That looks wrong.
> > More like
> > 
> >               if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> >                   && STMT_VINFO_RELATED_STMT (stmt_info) != stmt)
> >                 continue;
> > 
> > ?  Does the main stmt has the flag set and points to itself?
> 
> From the commentary at the end of tree-vect-patterns.c, only the main
> statement in the pattern (the last one) is flagged as
> STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
> statement which has been created but not yet inserted in the code.  It
> only gets counted once.
> 
> > 
> > >          /* Skip stmts that are not vectorized inside the loop.  */
> > >          if (!STMT_VINFO_RELEVANT_P (stmt_info)
> > >              && (!STMT_VINFO_LIVE_P (stmt_info)
> > > -                 || STMT_VINFO_DEF_TYPE (stmt_info) != 
> > > vect_reduction_def))
> > > +                 || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE 
> > > (stmt_info))))
> > >            continue;
> > > +
> > 
> > ... and then, what does STMT_VINFO_INSIDE_OF_LOOP_COST of
> > that "main" pattern stmt represent?  Shouldn't it represent the cost
> > of the whole sequence, and thus ...
> > 
> > >          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * 
> > > factor;
> > >          /* FIXME: for stmts in the inner-loop in outer-loop 
> > > vectorization,
> > >             some of the "outside" costs are generated inside the 
> > > outer-loop.  */
> > >          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
> > > +          if (is_pattern_stmt_p (stmt_info)
> > > +             && STMT_VINFO_PATTERN_DEF_SEQ (stmt_info))
> > > +            {
> > > +             gimple_stmt_iterator gsi;
> > 
> > The following is excessive?  Especially as you probably count the "main"
> > stmt twice?  Thus, if the main stmt cost does not cover the sequence
> > then you should skip adding its cost above?
> 
> This is what I believe is going on here.  Note that for the main pattern
> statement, we have converted it so that stmt_info now points to its
> related (replacement) statement.  It is apparently possible for this
> replacement statement to also be recognized as part of a new pattern
> (I've seen references to such possibilities elsewhere in the vectorizer
> code).  When that happens, this code is counting the costs of that
> pattern.
> 
> Ira, can you please weigh in on this part?  I'm not 100% certain of my
> explanation.

Also, I forgot to mention that I think Richard is right that the
replacement statement may be getting counted twice in this scenario. 

> 
> In any case, it's clear we need more comments in the code. :)
> 
> Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so
> this section has to be omitted if we backport it (which is desirable
> since the degradation was introduced in 4.6).  Removing it apparently
> does not affect the sphinx3 degradation.
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > +             for (gsi = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ 
> > > (stmt_info));
> > > +                  !gsi_end_p (gsi); gsi_next (&gsi))
> > > +                {
> > > +                  gimple pattern_def_stmt = gsi_stmt (gsi);
> > > +                  stmt_vec_info pattern_def_stmt_info
> > > +                   = vinfo_for_stmt (pattern_def_stmt);
> > > +                  if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
> > > +                      || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
> > > +                   {
> > > +                      vec_inside_cost
> > > +                       += STMT_VINFO_INSIDE_OF_LOOP_COST
> > > +                          (pattern_def_stmt_info) * factor;
> > > +                      vec_outside_cost
> > > +                       += STMT_VINFO_OUTSIDE_OF_LOOP_COST
> > > +                          (pattern_def_stmt_info);
> > > +                    }
> > > +               }
> > > +           }
> > >        }
> > >     }
> > >
> > > Index: gcc/tree-vect-stmts.c
> > > ===================================================================
> > > --- gcc/tree-vect-stmts.c       (revision 183944)
> > > +++ gcc/tree-vect-stmts.c       (working copy)
> > > @@ -811,6 +811,46 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i
> > >  }
> > >
> > >
> > > +/* Model cost for type demotion and promotion operations.  PWR is 
> > > normally
> > > +   zero for single-step promotions and demotions.  It will be one if
> > > +   two-step promotion/demotion is required, and so on.  Each additional
> > > +   step doubles the number of instructions required.  */
> > > +
> > > +static void
> > > +vect_model_promotion_demotion_cost (stmt_vec_info stmt_info,
> > > +                                   enum vect_def_type *dt, int pwr)
> > > +{
> > > +  int i, tmp;
> > > +  int inside_cost = 0, outside_cost = 0, single_stmt_cost;
> > > +
> > > +  /* The SLP costs were already calculated during SLP tree build.  */
> > > +  if (PURE_SLP_STMT (stmt_info))
> > > +    return;
> > > +
> > > +  single_stmt_cost = vect_get_stmt_cost (vec_promote_demote);
> > > +  for (i = 0; i < pwr + 1; i++)
> > > +    {
> > > +      tmp = (STMT_VINFO_TYPE (stmt_info) == 
> > > type_promotion_vec_info_type) ?
> > > +       (i + 1) : i;
> > > +      inside_cost += vect_pow2 (tmp) * single_stmt_cost;
> > > +    }
> > > +
> > > +  /* FORNOW: Assuming maximum 2 args per stmts.  */
> > > +  for (i = 0; i < 2; i++)
> > > +    {
> > > +      if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> > > +        outside_cost += vect_get_stmt_cost (vector_stmt);
> > > +    }
> > > +
> > > +  if (vect_print_dump_info (REPORT_COST))
> > > +    fprintf (vect_dump, "vect_model_promotion_demotion_cost: inside_cost 
> > > = %d, "
> > > +             "outside_cost = %d .", inside_cost, outside_cost);
> > > +
> > > +  /* Set the costs in STMT_INFO.  */
> > > +  stmt_vinfo_set_inside_of_loop_cost (stmt_info, NULL, inside_cost);
> > > +  stmt_vinfo_set_outside_of_loop_cost (stmt_info, NULL, outside_cost);
> > > +}
> > > +
> > >  /* Function vect_cost_strided_group_size
> > >
> > >    For strided load or store, return the group_size only if it is the 
> > > first
> > > @@ -887,7 +927,6 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
> > >       if (vect_print_dump_info (REPORT_COST))
> > >         fprintf (vect_dump, "vect_model_store_cost: strided group_size = 
> > > %d .",
> > >                  group_size);
> > > -
> > >     }
> > >
> > >   /* Costs of the stores.  */
> > > @@ -1049,7 +1088,7 @@ vect_get_load_cost (struct data_reference *dr, int
> > >     case dr_explicit_realign:
> > >       {
> > >         *inside_cost += ncopies * (2 * vect_get_stmt_cost (vector_load)
> > > -           + vect_get_stmt_cost (vector_stmt));
> > > +                                  + vect_get_stmt_cost (vec_perm));
> > >
> > >         /* FIXME: If the misalignment remains fixed across the iterations 
> > > of
> > >            the containing loop, the following cost should be added to the
> > > @@ -1057,6 +1096,9 @@ vect_get_load_cost (struct data_reference *dr, int
> > >         if (targetm.vectorize.builtin_mask_for_load)
> > >           *inside_cost += vect_get_stmt_cost (vector_stmt);
> > >
> > > +        if (vect_print_dump_info (REPORT_COST))
> > > +          fprintf (vect_dump, "vect_model_load_cost: explicit realign");
> > > +
> > >         break;
> > >       }
> > >     case dr_explicit_realign_optimized:
> > > @@ -1080,7 +1122,12 @@ vect_get_load_cost (struct data_reference *dr, int
> > >           }
> > >
> > >         *inside_cost += ncopies * (vect_get_stmt_cost (vector_load)
> > > -          + vect_get_stmt_cost (vector_stmt));
> > > +                                  + vect_get_stmt_cost (vec_perm));
> > > +
> > > +        if (vect_print_dump_info (REPORT_COST))
> > > +          fprintf (vect_dump,
> > > +                  "vect_model_load_cost: explicit realign optimized");
> > > +
> > >         break;
> > >       }
> > >
> > > @@ -2392,16 +2439,19 @@ vectorizable_conversion (gimple stmt, gimple_stmt_
> > >       if (vect_print_dump_info (REPORT_DETAILS))
> > >        fprintf (vect_dump, "=== vectorizable_conversion ===");
> > >       if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
> > > -       STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
> > > +        {
> > > +         STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
> > > +         vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> > > +       }
> > >       else if (modifier == NARROW)
> > >        {
> > >          STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
> > > -         vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
> > > +         vect_model_promotion_demotion_cost (stmt_info, dt, 
> > > multi_step_cvt);
> > >        }
> > >       else
> > >        {
> > >          STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
> > > -         vect_model_simple_cost (stmt_info, 2 * ncopies, dt, NULL);
> > > +         vect_model_promotion_demotion_cost (stmt_info, dt, 
> > > multi_step_cvt);
> > >        }
> > >       VEC_free (tree, heap, interm_types);
> > >       return true;
> > > Index: gcc/config/spu/spu.c
> > > ===================================================================
> > > --- gcc/config/spu/spu.c        (revision 183944)
> > > +++ gcc/config/spu/spu.c        (working copy)
> > > @@ -6920,6 +6920,7 @@ spu_builtin_vectorization_cost (enum vect_cost_for
> > >       case scalar_to_vec:
> > >       case cond_branch_not_taken:
> > >       case vec_perm:
> > > +      case vec_promote_demote:
> > >         return 1;
> > >
> > >       case scalar_store:
> > > Index: gcc/config/i386/i386.c
> > > ===================================================================
> > > --- gcc/config/i386/i386.c      (revision 183944)
> > > +++ gcc/config/i386/i386.c      (working copy)
> > > @@ -35336,6 +35336,7 @@ ix86_builtin_vectorization_cost (enum vect_cost_fo
> > >         return ix86_cost->cond_not_taken_branch_cost;
> > >
> > >       case vec_perm:
> > > +      case vec_promote_demote:
> > >         return ix86_cost->vec_stmt_cost;
> > >
> > >       default:
> > > Index: gcc/config/rs6000/rs6000.c
> > > ===================================================================
> > > --- gcc/config/rs6000/rs6000.c  (revision 183944)
> > > +++ gcc/config/rs6000/rs6000.c  (working copy)
> > > @@ -3543,10 +3543,17 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
> > >         return 1;
> > >
> > >       case vec_perm:
> > > -       if (!TARGET_VSX)
> > > +       if (TARGET_VSX)
> > > +         return 4;
> > > +       else
> > >          return 1;
> > > -       return 2;
> > >
> > > +      case vec_promote_demote:
> > > +        if (TARGET_VSX)
> > > +          return 5;
> > > +        else
> > > +          return 1;
> > > +
> > >       case cond_branch_taken:
> > >         return 3;
> > >
> > >
> > >
> > 

Reply via email to