Tamar Christina <tamar.christ...@arm.com> writes:
> Hi All,
>
> While chasing down a costing difference between SLP and non-SLP for memory
> access costing I noticed that at some point the SLP and non-SLP costing have
> diverged.  It used to be we only supported LOAD_LANES in SLP and so the 
> non-SLP
> costing was working fine.
>
> But with the change to SLP only we now lost costing.
>
> It looks like the vectorizer for non-SLP stores the VMAT type in
> STMT_VINFO_MEMORY_ACCESS_TYPE on the stmt_info, but for SLP it stores it in
> SLP_TREE_MEMORY_ACCESS_TYPE which is on the SLP node itself.
>
> While my first attempt of a patch was to just also store the VMAT in the
> stmt_info https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665295.html
> Richi pointed out that this goes wrong when the same access is used Hybrid.
>
> And so we have to do a backend specific fix.  To help out other backends this
> also introduces a generic helper function suggested by Richi in that patch
> (I hope that's ok.. I didn't want to split out just the helper.)
>
> This successfully restores VMAT based costing in the new SLP only world.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * tree-vectorizer.h (vect_mem_access_type): New.
>       * config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use it.
>       (aarch64_detect_vector_stmt_subtype): Likewise.
>       (aarch64_adjust_stmt_cost): Likewise.
>       (aarch64_vector_costs::count_ops): Likewise.
>       (aarch64_vector_costs::add_stmt_cost): Make SLP node named.

Looks OK, but some formatting trivia:

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 102680a0efca1ce928e6945033c01cfb68a65152..055b0ff47c68dc5e7560debe5a29dcdc9df21f8c
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16278,7 +16278,7 @@ public:
>  private:
>    void record_potential_advsimd_unrolling (loop_vec_info);
>    void analyze_loop_vinfo (loop_vec_info);
> -  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info,
> +  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info, slp_tree,
>                 aarch64_vec_op_count *);
>    fractional_cost adjust_body_cost_sve (const aarch64_vec_op_count *,
>                                       fractional_cost, unsigned int,
> @@ -16599,7 +16599,8 @@ aarch64_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>     vector of an LD[234] or ST[234] operation.  Return the total number of
>     vectors (2, 3 or 4) if so, otherwise return a value outside that range.  
> */
>  static int
> -aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info 
> stmt_info)
> +aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, stmt_vec_info 
> stmt_info,
> +                          slp_tree node)

Normally the comment should be updated to mention NODE.  But it probably
isn't worth it, since presumably after the SLP transition, everything
could be keyed off the node only (meaning a larger rewrite).

>  {
>    if ((kind == vector_load
>         || kind == unaligned_load
> @@ -16609,7 +16610,7 @@ aarch64_ld234_st234_vectors (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info)
>      {
>        stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
>        if (stmt_info
> -       && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
> +       && vect_mem_access_type (stmt_info, node) == VMAT_LOAD_STORE_LANES)
>       return DR_GROUP_SIZE (stmt_info);
>      }
>    return 0;
> @@ -16847,14 +16848,15 @@ aarch64_detect_scalar_stmt_subtype (vec_info 
> *vinfo, vect_cost_for_stmt kind,
>  }
>  
>  /* STMT_COST is the cost calculated by aarch64_builtin_vectorization_cost
> -   for the vectorized form of STMT_INFO, which has cost kind KIND and which
> -   when vectorized would operate on vector type VECTYPE.  Try to subdivide
> -   the target-independent categorization provided by KIND to get a more
> +   for the vectorized form of STMT_INFO possibly using SLP node NODE, which 
> has cost
> +   kind KIND and which when vectorized would operate on vector type VECTYPE. 
>  Try to
> +   subdivide the target-independent categorization provided by KIND to get a 
> more
>     accurate cost.  WHERE specifies where the cost associated with KIND
>     occurs.  */

Needs to be reflowed to 80 characters (all the "+" lines are longer).

>  static fractional_cost
>  aarch64_detect_vector_stmt_subtype (vec_info *vinfo, vect_cost_for_stmt kind,
> -                                 stmt_vec_info stmt_info, tree vectype,
> +                                 stmt_vec_info stmt_info, slp_tree node,
> +                                 tree vectype,
>                                   enum vect_cost_model_location where,
>                                   fractional_cost stmt_cost)
>  {
> [...]
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> 2775d873ca42436fb6b6789ca8102c03e2540b4f..eafab59fdf53e570ebbacebbec5533fcd3dff509
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2720,6 +2720,17 @@ vect_is_reduction (stmt_vec_info stmt_info)
>    return STMT_VINFO_REDUC_IDX (stmt_info) >= 0;
>  }
>  
> +/* Returns the memory acccess type being used to vectorize the statement.  
> If SLP

Long line here too.

> +   this is read from NODE, otherwise it's read from the STMT_VINFO.  */
> +
> +inline vect_memory_access_type
> +vect_mem_access_type (stmt_vec_info stmt_info, slp_tree node)
> +{
> +  if (node)
> +    return SLP_TREE_MEMORY_ACCESS_TYPE (node);
> +  else
> +    return STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info);
> +}
>  /* If STMT_INFO describes a reduction, return the vect_reduction_type

Missing blank line between functions.

OK with those changes, thanks.

Richard

>     of the reduction it describes, otherwise return -1.  */
>  inline int

Reply via email to