ping PR79262 has been fixed for almost all AArch64 cpus, however the example is still vectorized in a few cases, resulting in lower performance. Increase the cost of vector-to-scalar moves so it is more similar to the other vector costs. As a result -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6 performance improves.
OK for commit? ChangeLog: 2018-01-22 Wilco Dijkstra <wdijk...@arm.com> PR target/79262 * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost = 1, /* vec_int_stmt_cost */ 1, /* vec_fp_stmt_cost */ 2, /* vec_permute_cost */ - 1, /* vec_to_scalar_cost */ + 2, /* vec_to_scalar_cost */ 1, /* scalar_to_vec_cost */ 1, /* vec_align_load_cost */ 1, /* vec_unalign_load_cost */