On Sep 20, 2017, at 2:14 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > > On Sep 20, 2017, at 3:49 AM, Richard Sandiford <richard.sandif...@linaro.org> > wrote: >> >> But I think this shows up another problem. In the vectorised loop, >> we have 1 copy of the load and 4 copies of the ABS (after unpacking). >> But vect_analyze_slp_cost_1 is being called with ncopies_for_cost == 4 >> even for the loads. So I think it's a double whammy: we've quadrupling >> the real cost via the excessive ncopies_for_cost, and we're using the >> wrong access type too. >> >> The patch below should fix the second problem but isn't much use without >> the first. I think we need to adjust ncopies_for_cost in the recursive >> calls as well. > > Unfortunately we have to deal with the fact that ncopies_for_cost is set > once for the whole SLP instance, which is not ideal since we know the > number of loads and stores doesn't follow the same rules. > > What about the following? I *think* that group_size / nunits is sufficient > for > VMAT_STRIDED_SLP and VMAT_CONTIGUOUS for which permuted loads are > not possible (already handled differently), but I could well be missing > something.
Uh, I need to do the obvious rounding of (group_size + nunits - 1) / nunits... Bill > > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c (revision 252760) > +++ gcc/tree-vect-slp.c (working copy) > @@ -1662,12 +1662,14 @@ vect_analyze_slp_cost_1 (slp_instance instance, sl > stmt_info = vinfo_for_stmt (stmt); > if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > { > + int group_size = GROUP_SIZE (stmt_info); > + int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)); > vect_memory_access_type memory_access_type > - = (STMT_VINFO_STRIDED_P (stmt_info) > + = (STMT_VINFO_STRIDED_P (stmt_info) && group_size > nunits > ? VMAT_STRIDED_SLP > : VMAT_CONTIGUOUS); > if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info))) > - vect_model_store_cost (stmt_info, ncopies_for_cost, > + vect_model_store_cost (stmt_info, group_size / nunits, > memory_access_type, vect_uninitialized_def, > node, prologue_cost_vec, body_cost_vec); > else > @@ -1714,6 +1716,8 @@ vect_analyze_slp_cost_1 (slp_instance instance, sl > + nunits - 1) / nunits); > ncopies_for_cost *= SLP_INSTANCE_UNROLLING_FACTOR (instance); > } > + else > + ncopies_for_cost = group_size / nunits; > /* Record the cost for the vector loads. */ > vect_model_load_cost (stmt_info, ncopies_for_cost, > memory_access_type, node, prologue_cost_vec, > > Bill >