On Fri, 2012-02-10 at 07:31 -0600, William J. Schmidt wrote: > Richard, thanks. I can answer most of your questions, but for the last > one I will have to ask Ira to weigh in. > > On Fri, 2012-02-10 at 13:06 +0100, Richard Guenther wrote: > > On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt > > <wschm...@linux.vnet.ibm.com> wrote: > > > Following is a revision of yesterday's PR50031 patch submission, > > > modified per Richard's comments. Bootstrapped and tested with no > > > regressions on powerpc64-linux. I've confirmed the same performance > > > improvements in SPEC. OK for trunk? > > > > Some more questions - maybe Jakub can clarify as well given he worked > > on patterns recently ... > > > > > Thanks, > > > Bill > > > > > > > > > 2012-02-09 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > > Ira Rosen <i...@il.ibm.com> > > > > > > PR tree-optimization/50031 > > > * targhooks.c (default_builtin_vectorization_cost): Handle > > > vec_promote_demote. > > > * target.h (enum vect_cost_for_stmt): Add vec_promote_demote. > > > * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle > > > all types of reduction and pattern statements. > > > (vect_estimate_min_profitable_iters): Likewise. > > > * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New > > > function. > > > (vect_get_load_cost): Use vec_perm for permutations; add dump logic > > > for explicit realigns. > > > (vectorizable_conversion): Call vect_model_promotion_demotion_cost. > > > * config/spu/spu.c (spu_builtin_vectorization_cost): Handle > > > vec_promote_demote. > > > * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. > > > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): > > > Update > > > vec_perm for VSX and handle vec_promote_demote. > > > > > > > > > Index: gcc/targhooks.c > > > =================================================================== > > > --- gcc/targhooks.c (revision 183944) > > > +++ gcc/targhooks.c (working copy) > > > @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost > > > case scalar_to_vec: > > > case cond_branch_not_taken: > > > case vec_perm: > > > + case vec_promote_demote: > > > return 1; > > > > > > case unaligned_load: > > > Index: gcc/target.h > > > =================================================================== > > > --- gcc/target.h (revision 183944) > > > +++ gcc/target.h (working copy) > > > @@ -145,7 +145,8 @@ enum vect_cost_for_stmt > > > scalar_to_vec, > > > cond_branch_not_taken, > > > cond_branch_taken, > > > - vec_perm > > > + vec_perm, > > > + vec_promote_demote > > > }; > > > > > > /* The target structure. This holds all the backend hooks. */ > > > Index: gcc/tree-vect-loop.c > > > =================================================================== > > > --- gcc/tree-vect-loop.c (revision 183944) > > > +++ gcc/tree-vect-loop.c (working copy) > > > @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf > > > if (stmt_info > > > && !STMT_VINFO_RELEVANT_P (stmt_info) > > > && (!STMT_VINFO_LIVE_P (stmt_info) > > > - || STMT_VINFO_DEF_TYPE (stmt_info) != > > > vect_reduction_def)) > > > + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE > > > (stmt_info))) > > > + && !STMT_VINFO_IN_PATTERN_P (stmt_info)) > > > continue; > > > > Why would we exclude !relevant stmts when they are part of a pattern? > > We are looking at a scalar iteration cost, so all stmts that are not "dead" > > count, no? > > As I understand it, we're at a point where a statement replacing the > pattern exists but has not yet been inserted in the code stream. All of > the statements in the pattern are marked irrelevant, but the related > statement of the main (last) statement of the pattern is relevant. Thus > we need to allow the main statement through this check so the > replacement statement can be found and counted. > > > > > > > > > if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))) > > > @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info > > > { > > > gimple stmt = gsi_stmt (si); > > > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > > + > > > + /* Translate the last statement in a pattern into the > > > + related replacement statement. */ > > > + if (STMT_VINFO_IN_PATTERN_P (stmt_info)) > > > + { > > > + stmt = STMT_VINFO_RELATED_STMT (stmt_info); > > > + stmt_info = vinfo_for_stmt (stmt); > > > + } > > > > So here we are tanslating stmt to the "main" scalar pattern stmt - and thus > > count it as many times as we have stmts in that pattern? That looks wrong. > > More like > > > > if (STMT_VINFO_IN_PATTERN_P (stmt_info) > > && STMT_VINFO_RELATED_STMT (stmt_info) != stmt) > > continue; > > > > ? Does the main stmt has the flag set and points to itself? > > From the commentary at the end of tree-vect-patterns.c, only the main > statement in the pattern (the last one) is flagged as > STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement > statement which has been created but not yet inserted in the code. It > only gets counted once. > > > > > > /* Skip stmts that are not vectorized inside the loop. */ > > > if (!STMT_VINFO_RELEVANT_P (stmt_info) > > > && (!STMT_VINFO_LIVE_P (stmt_info) > > > - || STMT_VINFO_DEF_TYPE (stmt_info) != > > > vect_reduction_def)) > > > + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE > > > (stmt_info)))) > > > continue; > > > + > > > > ... and then, what does STMT_VINFO_INSIDE_OF_LOOP_COST of > > that "main" pattern stmt represent? Shouldn't it represent the cost > > of the whole sequence, and thus ... > > > > > vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * > > > factor; > > > /* FIXME: for stmts in the inner-loop in outer-loop > > > vectorization, > > > some of the "outside" costs are generated inside the > > > outer-loop. */ > > > vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info); > > > + if (is_pattern_stmt_p (stmt_info) > > > + && STMT_VINFO_PATTERN_DEF_SEQ (stmt_info)) > > > + { > > > + gimple_stmt_iterator gsi; > > > > The following is excessive? Especially as you probably count the "main" > > stmt twice? Thus, if the main stmt cost does not cover the sequence > > then you should skip adding its cost above? > > This is what I believe is going on here. Note that for the main pattern > statement, we have converted it so that stmt_info now points to its > related (replacement) statement. It is apparently possible for this > replacement statement to also be recognized as part of a new pattern > (I've seen references to such possibilities elsewhere in the vectorizer > code). When that happens, this code is counting the costs of that > pattern. > > Ira, can you please weigh in on this part? I'm not 100% certain of my > explanation.
Also, I forgot to mention that I think Richard is right that the replacement statement may be getting counted twice in this scenario. > > In any case, it's clear we need more comments in the code. :) > > Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so > this section has to be omitted if we backport it (which is desirable > since the degradation was introduced in 4.6). Removing it apparently > does not affect the sphinx3 degradation. > > > > > Thanks, > > Richard. > > > > > + for (gsi = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ > > > (stmt_info)); > > > + !gsi_end_p (gsi); gsi_next (&gsi)) > > > + { > > > + gimple pattern_def_stmt = gsi_stmt (gsi); > > > + stmt_vec_info pattern_def_stmt_info > > > + = vinfo_for_stmt (pattern_def_stmt); > > > + if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info) > > > + || STMT_VINFO_LIVE_P (pattern_def_stmt_info)) > > > + { > > > + vec_inside_cost > > > + += STMT_VINFO_INSIDE_OF_LOOP_COST > > > + (pattern_def_stmt_info) * factor; > > > + vec_outside_cost > > > + += STMT_VINFO_OUTSIDE_OF_LOOP_COST > > > + (pattern_def_stmt_info); > > > + } > > > + } > > > + } > > > } > > > } > > > > > > Index: gcc/tree-vect-stmts.c > > > =================================================================== > > > --- gcc/tree-vect-stmts.c (revision 183944) > > > +++ gcc/tree-vect-stmts.c (working copy) > > > @@ -811,6 +811,46 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i > > > } > > > > > > > > > +/* Model cost for type demotion and promotion operations. PWR is > > > normally > > > + zero for single-step promotions and demotions. It will be one if > > > + two-step promotion/demotion is required, and so on. Each additional > > > + step doubles the number of instructions required. */ > > > + > > > +static void > > > +vect_model_promotion_demotion_cost (stmt_vec_info stmt_info, > > > + enum vect_def_type *dt, int pwr) > > > +{ > > > + int i, tmp; > > > + int inside_cost = 0, outside_cost = 0, single_stmt_cost; > > > + > > > + /* The SLP costs were already calculated during SLP tree build. */ > > > + if (PURE_SLP_STMT (stmt_info)) > > > + return; > > > + > > > + single_stmt_cost = vect_get_stmt_cost (vec_promote_demote); > > > + for (i = 0; i < pwr + 1; i++) > > > + { > > > + tmp = (STMT_VINFO_TYPE (stmt_info) == > > > type_promotion_vec_info_type) ? > > > + (i + 1) : i; > > > + inside_cost += vect_pow2 (tmp) * single_stmt_cost; > > > + } > > > + > > > + /* FORNOW: Assuming maximum 2 args per stmts. */ > > > + for (i = 0; i < 2; i++) > > > + { > > > + if (dt[i] == vect_constant_def || dt[i] == vect_external_def) > > > + outside_cost += vect_get_stmt_cost (vector_stmt); > > > + } > > > + > > > + if (vect_print_dump_info (REPORT_COST)) > > > + fprintf (vect_dump, "vect_model_promotion_demotion_cost: inside_cost > > > = %d, " > > > + "outside_cost = %d .", inside_cost, outside_cost); > > > + > > > + /* Set the costs in STMT_INFO. */ > > > + stmt_vinfo_set_inside_of_loop_cost (stmt_info, NULL, inside_cost); > > > + stmt_vinfo_set_outside_of_loop_cost (stmt_info, NULL, outside_cost); > > > +} > > > + > > > /* Function vect_cost_strided_group_size > > > > > > For strided load or store, return the group_size only if it is the > > > first > > > @@ -887,7 +927,6 @@ vect_model_store_cost (stmt_vec_info stmt_info, in > > > if (vect_print_dump_info (REPORT_COST)) > > > fprintf (vect_dump, "vect_model_store_cost: strided group_size = > > > %d .", > > > group_size); > > > - > > > } > > > > > > /* Costs of the stores. */ > > > @@ -1049,7 +1088,7 @@ vect_get_load_cost (struct data_reference *dr, int > > > case dr_explicit_realign: > > > { > > > *inside_cost += ncopies * (2 * vect_get_stmt_cost (vector_load) > > > - + vect_get_stmt_cost (vector_stmt)); > > > + + vect_get_stmt_cost (vec_perm)); > > > > > > /* FIXME: If the misalignment remains fixed across the iterations > > > of > > > the containing loop, the following cost should be added to the > > > @@ -1057,6 +1096,9 @@ vect_get_load_cost (struct data_reference *dr, int > > > if (targetm.vectorize.builtin_mask_for_load) > > > *inside_cost += vect_get_stmt_cost (vector_stmt); > > > > > > + if (vect_print_dump_info (REPORT_COST)) > > > + fprintf (vect_dump, "vect_model_load_cost: explicit realign"); > > > + > > > break; > > > } > > > case dr_explicit_realign_optimized: > > > @@ -1080,7 +1122,12 @@ vect_get_load_cost (struct data_reference *dr, int > > > } > > > > > > *inside_cost += ncopies * (vect_get_stmt_cost (vector_load) > > > - + vect_get_stmt_cost (vector_stmt)); > > > + + vect_get_stmt_cost (vec_perm)); > > > + > > > + if (vect_print_dump_info (REPORT_COST)) > > > + fprintf (vect_dump, > > > + "vect_model_load_cost: explicit realign optimized"); > > > + > > > break; > > > } > > > > > > @@ -2392,16 +2439,19 @@ vectorizable_conversion (gimple stmt, gimple_stmt_ > > > if (vect_print_dump_info (REPORT_DETAILS)) > > > fprintf (vect_dump, "=== vectorizable_conversion ==="); > > > if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR) > > > - STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type; > > > + { > > > + STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type; > > > + vect_model_simple_cost (stmt_info, ncopies, dt, NULL); > > > + } > > > else if (modifier == NARROW) > > > { > > > STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type; > > > - vect_model_simple_cost (stmt_info, ncopies, dt, NULL); > > > + vect_model_promotion_demotion_cost (stmt_info, dt, > > > multi_step_cvt); > > > } > > > else > > > { > > > STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type; > > > - vect_model_simple_cost (stmt_info, 2 * ncopies, dt, NULL); > > > + vect_model_promotion_demotion_cost (stmt_info, dt, > > > multi_step_cvt); > > > } > > > VEC_free (tree, heap, interm_types); > > > return true; > > > Index: gcc/config/spu/spu.c > > > =================================================================== > > > --- gcc/config/spu/spu.c (revision 183944) > > > +++ gcc/config/spu/spu.c (working copy) > > > @@ -6920,6 +6920,7 @@ spu_builtin_vectorization_cost (enum vect_cost_for > > > case scalar_to_vec: > > > case cond_branch_not_taken: > > > case vec_perm: > > > + case vec_promote_demote: > > > return 1; > > > > > > case scalar_store: > > > Index: gcc/config/i386/i386.c > > > =================================================================== > > > --- gcc/config/i386/i386.c (revision 183944) > > > +++ gcc/config/i386/i386.c (working copy) > > > @@ -35336,6 +35336,7 @@ ix86_builtin_vectorization_cost (enum vect_cost_fo > > > return ix86_cost->cond_not_taken_branch_cost; > > > > > > case vec_perm: > > > + case vec_promote_demote: > > > return ix86_cost->vec_stmt_cost; > > > > > > default: > > > Index: gcc/config/rs6000/rs6000.c > > > =================================================================== > > > --- gcc/config/rs6000/rs6000.c (revision 183944) > > > +++ gcc/config/rs6000/rs6000.c (working copy) > > > @@ -3543,10 +3543,17 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > > > return 1; > > > > > > case vec_perm: > > > - if (!TARGET_VSX) > > > + if (TARGET_VSX) > > > + return 4; > > > + else > > > return 1; > > > - return 2; > > > > > > + case vec_promote_demote: > > > + if (TARGET_VSX) > > > + return 5; > > > + else > > > + return 1; > > > + > > > case cond_branch_taken: > > > return 3; > > > > > > > > > > >