If the autovectorizer tries to load a GCN 64-lane vector elementwise then it
blows away the register file and produces horrible code.

This patch simply disallows elementwise loads for such large vectors.  Is there
a better way to disable this in the middle-end?

2018-09-05  Julian Brown  <jul...@codesourcery.com>

        gcc/
        * tree-vect-stmts.c (get_load_store_type): Don't use VMAT_ELEMENTWISE
        loads/stores with many-element (>=64) vectors.
---
 gcc/tree-vect-stmts.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 8875201..a333991 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2452,6 +2452,26 @@ get_load_store_type (stmt_vec_info stmt_info, tree vectype, bool slp,
 	*memory_access_type = VMAT_CONTIGUOUS;
     }
 
+  /* FIXME: Element-wise accesses can be extremely expensive if we have a
+     large number of elements to deal with (e.g. 64 for AMD GCN) using the
+     current generic code expansion.  Until an efficient code sequence is
+     supported for affected targets instead, don't attempt vectorization for
+     VMAT_ELEMENTWISE at all.  */
+  if (*memory_access_type == VMAT_ELEMENTWISE)
+    {
+      poly_uint64 nelements = TYPE_VECTOR_SUBPARTS (vectype);
+
+      if (maybe_ge (nelements, 64))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+	      "too many elements (%u) for elementwise accesses\n",
+	      (unsigned) nelements.to_constant ());
+
+	  return false;
+	}
+    }
+
   if ((*memory_access_type == VMAT_ELEMENTWISE
        || *memory_access_type == VMAT_STRIDED_SLP)
       && !nunits.is_constant ())

Reply via email to