If the autovectorizer tries to load a GCN 64-lane vector elementwise then it blows away the register file and produces horrible code.
This patch simply disallows elementwise loads for such large vectors. Is there a better way to disable this in the middle-end? 2018-09-05 Julian Brown <jul...@codesourcery.com> gcc/ * tree-vect-stmts.c (get_load_store_type): Don't use VMAT_ELEMENTWISE loads/stores with many-element (>=64) vectors. --- gcc/tree-vect-stmts.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 8875201..a333991 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -2452,6 +2452,26 @@ get_load_store_type (stmt_vec_info stmt_info, tree vectype, bool slp, *memory_access_type = VMAT_CONTIGUOUS; } + /* FIXME: Element-wise accesses can be extremely expensive if we have a + large number of elements to deal with (e.g. 64 for AMD GCN) using the + current generic code expansion. Until an efficient code sequence is + supported for affected targets instead, don't attempt vectorization for + VMAT_ELEMENTWISE at all. */ + if (*memory_access_type == VMAT_ELEMENTWISE) + { + poly_uint64 nelements = TYPE_VECTOR_SUBPARTS (vectype); + + if (maybe_ge (nelements, 64)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "too many elements (%u) for elementwise accesses\n", + (unsigned) nelements.to_constant ()); + + return false; + } + } + if ((*memory_access_type == VMAT_ELEMENTWISE || *memory_access_type == VMAT_STRIDED_SLP) && !nunits.is_constant ())