Wilco Dijkstra <wilco.dijks...@arm.com> writes: > ping > > PR79262 has been fixed for almost all AArch64 cpus, however the example is > still > vectorized in a few cases, resulting in lower performance. Increase the cost > of > vector-to-scalar moves so it is more similar to the other vector costs. As a > result > -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6 > performance improves. > > OK for commit? > > ChangeLog: > 2018-01-22 Wilco Dijkstra <wdijk...@arm.com> > > PR target/79262 > * config/aarch64/aarch64.c (generic_vector_cost): Adjust > vec_to_scalar_cost.
OK, thanks, and sorry for the delay. qdf24xx_vector_cost is the only specific CPU cost table with a vec_to_scalar_cost as low as 1. It's not obvious how emphatic that choice is though. It looks like qdf24xx_vector_cost might (very reasonably!) have started out as a copy of the generic costs with some targeted changes. But even if 1 is accurate there from a h/w perspective, the problem is that the vectoriser's costings have a tendency to miss additional overhead involved in scalarisation. Although increasing the cost to avoid that might be a bit of a hack, it's the accepted hack. So I suspect in practice all CPUs will benefit from a higher cost, not just those whose CPU tables already have one. On that basis, increasing the generic cost by the smallest possible amount should be a good change across the board. If anyone finds a counter-example, please let us know or file a bug. Richard > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost = > 1, /* vec_int_stmt_cost */ > 1, /* vec_fp_stmt_cost */ > 2, /* vec_permute_cost */ > - 1, /* vec_to_scalar_cost */ > + 2, /* vec_to_scalar_cost */ > 1, /* scalar_to_vec_cost */ > 1, /* vec_align_load_cost */ > 1, /* vec_unalign_load_cost */