On Sep 20, 2017, at 3:49 AM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > > But I think this shows up another problem. In the vectorised loop, > we have 1 copy of the load and 4 copies of the ABS (after unpacking). > But vect_analyze_slp_cost_1 is being called with ncopies_for_cost == 4 > even for the loads. So I think it's a double whammy: we've quadrupling > the real cost via the excessive ncopies_for_cost, and we're using the > wrong access type too. > > The patch below should fix the second problem but isn't much use without > the first. I think we need to adjust ncopies_for_cost in the recursive > calls as well.
Unfortunately we have to deal with the fact that ncopies_for_cost is set once for the whole SLP instance, which is not ideal since we know the number of loads and stores doesn't follow the same rules. What about the following? I *think* that group_size / nunits is sufficient for VMAT_STRIDED_SLP and VMAT_CONTIGUOUS for which permuted loads are not possible (already handled differently), but I could well be missing something. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 252760) +++ gcc/tree-vect-slp.c (working copy) @@ -1662,12 +1662,14 @@ vect_analyze_slp_cost_1 (slp_instance instance, sl stmt_info = vinfo_for_stmt (stmt); if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { + int group_size = GROUP_SIZE (stmt_info); + int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)); vect_memory_access_type memory_access_type - = (STMT_VINFO_STRIDED_P (stmt_info) + = (STMT_VINFO_STRIDED_P (stmt_info) && group_size > nunits ? VMAT_STRIDED_SLP : VMAT_CONTIGUOUS); if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info))) - vect_model_store_cost (stmt_info, ncopies_for_cost, + vect_model_store_cost (stmt_info, group_size / nunits, memory_access_type, vect_uninitialized_def, node, prologue_cost_vec, body_cost_vec); else @@ -1714,6 +1716,8 @@ vect_analyze_slp_cost_1 (slp_instance instance, sl + nunits - 1) / nunits); ncopies_for_cost *= SLP_INSTANCE_UNROLLING_FACTOR (instance); } + else + ncopies_for_cost = group_size / nunits; /* Record the cost for the vector loads. */ vect_model_load_cost (stmt_info, ncopies_for_cost, memory_access_type, node, prologue_cost_vec, Bill