[PATCH, i386]: Fix PR77621 (target part), handle Atom V2DFmode tuning through vector cost infrastructure

Uros Bizjak Tue, 20 Sep 2016 10:04:12 -0700

Hello!

Attached patch improves Atom V2DFmode vectorization by penalizing V2DF
mode insns through vector cost infrastructure. Looking at Agner Fog's
instruction tables, it is evident that only V2DFmode arithmetic insns
(e.g. addpd, maxpd, sqrtpd, ...) have increased latencies, and we
shouldn't fully disable V2DFmode vectorization with a "big hammer"
approach.


As suggested in the PR, we now increase the cost of problematic insns
in ix86_add_stmt_cost. This is enough to prevent vectorization of
V2DFmode loops, but we still allow cases where a single problematic
insn would prevent vectorization of a complex, mostly integer loop.

(BTW: The factor 5 is arbitrary, but based on the factor between
latencies of V2DFmode  and DFmode insns).

2016-09-20  Uros Bizjak  <ubiz...@gmail.com>

    PR target/77621
    * config/i386/i386.c (ix86_preferred_simd_mode) <case DFmode>:
    Don't return word_mode for !TARGET_VECTORIZE_DOUBLE.
    (ix86_add_stmt_cost): Penalize DFmode vector operations
    for !TARGET_VECTORIZE_DOUBLE.

testsuite/ChangeLog:

2016-09-20  Uros Bizjak  <ubiz...@gmail.com>

    PR target/77621
    * gcc.target/i386/pr77621.c: New test.
    * gcc.target/i386/vect-double-2.c: Update scan-tree-dump-times
    pattern, loop should vectorize with -mtune=atom.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 240263)
+++ config/i386/i386.c  (working copy)
@@ -49554,9 +49554,7 @@ ix86_preferred_simd_mode (machine_mode mode)
        return V4SFmode;
 
     case DFmode:
-      if (!TARGET_VECTORIZE_DOUBLE)
-       return word_mode;
-      else if (TARGET_AVX512F)
+      if (TARGET_AVX512F)
        return V8DFmode;
       else if (TARGET_AVX && !TARGET_PREFER_AVX128)
        return V4DFmode;
@@ -49647,9 +49645,14 @@ ix86_add_stmt_cost (void *data, int count, enum ve
   tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
   int stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
 
+  /* Penalize DFmode vector operations for !TARGET_VECTORIZE_DOUBLE.  */
+  if (kind == vector_stmt && !TARGET_VECTORIZE_DOUBLE
+      && vectype && GET_MODE_INNER (TYPE_MODE (vectype)) == DFmode)
+    stmt_cost *= 5;  /* FIXME: The value here is arbitrary.  */
+
   /* Statements in an inner loop relative to the loop being
      vectorized are weighted more heavily.  The value here is
-      arbitrary and could potentially be improved with analysis.  */
+     arbitrary and could potentially be improved with analysis.  */
   if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info))
     count *= 50;  /* FIXME.  */
 
Index: testsuite/gcc.target/i386/pr77621.c
===================================================================
--- testsuite/gcc.target/i386/pr77621.c (nonexistent)
+++ testsuite/gcc.target/i386/pr77621.c (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mtune=atom -msse2 -fdump-tree-vect-stats" } */
+
+void
+foo (double *x, int *y)
+{
+  int i;
+  for (i = 0; i < 8; i++)
+    x[i] -= y[i] * x[i + 1];
+}
+
+/* { dg-final { scan-tree-dump-not "Vectorized loops: 1" "vect" } } */
Index: testsuite/gcc.target/i386/vect-double-2.c
===================================================================
--- testsuite/gcc.target/i386/vect-double-2.c   (revision 240263)
+++ testsuite/gcc.target/i386/vect-double-2.c   (working copy)
@@ -31,4 +31,4 @@ sse2_test (void)
     }
 }
 
-/* { dg-final { scan-tree-dump-not "vectorized 1 loops" "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorized loops: 1" 1 "vect" } } */

[PATCH, i386]: Fix PR77621 (target part), handle Atom V2DFmode tuning through vector cost infrastructure

Reply via email to