Re: [PATCH][AArch64] Tweak Cortex-A57 vector cost

Richard Earnshaw Fri, 11 Nov 2016 02:18:09 -0800

On 10/11/16 17:10, Wilco Dijkstra wrote:
> The existing vector costs stop some beneficial vectorization.  This is mostly 
> due
> to vector statement cost being set to 3 as well as vector loads having a 
> higher
> cost than scalar loads.  This means that even when we vectorize 4x, it is 
> possible
> that the cost of a vectorized loop is similar to the scalar version, and we 
> fail
> to vectorize.  For example for a particular loop the costs for -mcpu=generic 
> are:
> 
> note: Cost model analysis: 
>   Vector inside of loop cost: 146
>   Vector prologue cost: 5
>   Vector epilogue cost: 0
>   Scalar iteration cost: 50
>   Scalar outside cost: 0
>   Vector outside cost: 5
>   prologue iterations: 0
>   epilogue iterations: 0
>   Calculated minimum iters for profitability: 1
> note:   Runtime profitability threshold = 3
> note:   Static estimate profitability threshold = 3
> note: loop vectorized
> 
> 
> While -mcpu=cortex-a57 reports:
> 
> note: Cost model analysis: 
>   Vector inside of loop cost: 294
>   Vector prologue cost: 15
>   Vector epilogue cost: 0
>   Scalar iteration cost: 74
>   Scalar outside cost: 0
>   Vector outside cost: 15
>   prologue iterations: 0
>   epilogue iterations: 0
>   Calculated minimum iters for profitability: 31
> note:   Runtime profitability threshold = 30
> note:   Static estimate profitability threshold = 30
> note: not vectorized: vectorization not profitable.
> note: not vectorized: iteration count smaller than user specified loop bound 
> parameter or minimum profitable iterations (whichever is more conservative).
> 
> 
> Using a cost of 3 for a vector operation suggests they are 3 times as
> expensive as scalar operations.  Since most vector operations have a 
> similar throughput as scalar operations, this is not correct.
> 
> Using slightly lower values for these heuristics now allows this loop
> and many others to be vectorized.  On a proprietary benchmark the gain
> from vectorizing this loop is around 15-30% which shows vectorizing it is
> indeed beneficial.
> 
> ChangeLog:
> 2016-11-10  Wilco Dijkstra  <wdijk...@arm.com>
> 
>       * config/aarch64/aarch64.c (cortexa57_vector_cost):
>       Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.
>


OK.

R.

> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -382,12 +382,12 @@ static const struct cpu_vector_cost 
> cortexa57_vector_cost =
>    1, /* scalar_stmt_cost  */
>    4, /* scalar_load_cost  */
>    1, /* scalar_store_cost  */
> -  3, /* vec_stmt_cost  */
> +  2, /* vec_stmt_cost  */
>    3, /* vec_permute_cost  */
>    8, /* vec_to_scalar_cost  */
>    8, /* scalar_to_vec_cost  */
> -  5, /* vec_align_load_cost  */
> -  5, /* vec_unalign_load_cost  */
> +  4, /* vec_align_load_cost  */
> +  4, /* vec_unalign_load_cost  */
>    1, /* vec_unalign_store_cost  */
>    1, /* vec_store_cost  */
>    1, /* cond_taken_branch_cost  */
>

Re: [PATCH][AArch64] Tweak Cortex-A57 vector cost

Reply via email to