On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > PR79262 has been fixed for almost all AArch64 cpus, however the example is > still > vectorized in a few cases, resulting in lower performance. Increase the cost > of > vector-to-scalar moves so it is more similar to the other vector costs. As a > result > -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6 > performance improves. > > OK for commit?
It would be better to dissect this cost into vec_to_scalar and vec_extract where vec_to_scalar really means getting at the scalar value of a vector of uniform values which most targets can do without any instruction (just use a subreg). I suppose we could also make vec_to_scalar equal to vector extraction and remove the uses for the other case (reduction vector result to scalar reg). Richard. > ChangeLog: > 2018-01-22 Wilco Dijkstra <wdijk...@arm.com> > > PR target/79262 > * config/aarch64/aarch64.c (generic_vector_cost): Adjust > vec_to_scalar_cost. > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost = > 1, /* vec_int_stmt_cost */ > 1, /* vec_fp_stmt_cost */ > 2, /* vec_permute_cost */ > - 1, /* vec_to_scalar_cost */ > + 2, /* vec_to_scalar_cost */ > 1, /* scalar_to_vec_cost */ > 1, /* vec_align_load_cost */ > 1, /* vec_unalign_load_cost */