On Sep 20, 2017, at 2:14 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote:
> 
> On Sep 20, 2017, at 3:49 AM, Richard Sandiford <richard.sandif...@linaro.org> 
> wrote:
>> 
>> But I think this shows up another problem.  In the vectorised loop,
>> we have 1 copy of the load and 4 copies of the ABS (after unpacking).
>> But vect_analyze_slp_cost_1 is being called with ncopies_for_cost == 4
>> even for the loads.  So I think it's a double whammy: we've quadrupling
>> the real cost via the excessive ncopies_for_cost, and we're using the
>> wrong access type too.
>> 
>> The patch below should fix the second problem but isn't much use without
>> the first.  I think we need to adjust ncopies_for_cost in the recursive
>> calls as well.
> 
> Unfortunately we have to deal with the fact that ncopies_for_cost is set
> once for the whole SLP instance, which is not ideal since we know the
> number of loads and stores doesn't follow the same rules.
> 
> What about the following?  I *think* that group_size / nunits is sufficient 
> for
> VMAT_STRIDED_SLP and VMAT_CONTIGUOUS for which permuted loads are
> not possible (already handled differently), but I could well be missing 
> something.

Uh, I need to do the obvious rounding of (group_size + nunits - 1) / nunits...

Bill
> 
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c       (revision 252760)
> +++ gcc/tree-vect-slp.c       (working copy)
> @@ -1662,12 +1662,14 @@ vect_analyze_slp_cost_1 (slp_instance instance, sl
>   stmt_info = vinfo_for_stmt (stmt);
>   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>     {
> +      int group_size = GROUP_SIZE (stmt_info);
> +      int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
>       vect_memory_access_type memory_access_type
> -     = (STMT_VINFO_STRIDED_P (stmt_info)
> +     = (STMT_VINFO_STRIDED_P (stmt_info) && group_size > nunits
>          ? VMAT_STRIDED_SLP
>          : VMAT_CONTIGUOUS);
>       if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)))
> -     vect_model_store_cost (stmt_info, ncopies_for_cost,
> +     vect_model_store_cost (stmt_info, group_size / nunits,
>                              memory_access_type, vect_uninitialized_def,
>                              node, prologue_cost_vec, body_cost_vec);
>       else
> @@ -1714,6 +1716,8 @@ vect_analyze_slp_cost_1 (slp_instance instance, sl
>                             + nunits - 1) / nunits);
>             ncopies_for_cost *= SLP_INSTANCE_UNROLLING_FACTOR (instance);
>           }
> +       else
> +         ncopies_for_cost = group_size / nunits;
>         /* Record the cost for the vector loads.  */
>         vect_model_load_cost (stmt_info, ncopies_for_cost,
>                               memory_access_type, node, prologue_cost_vec,
> 
> Bill
> 

Reply via email to