Hello! Attached patch improves Atom V2DFmode vectorization by penalizing V2DF mode insns through vector cost infrastructure. Looking at Agner Fog's instruction tables, it is evident that only V2DFmode arithmetic insns (e.g. addpd, maxpd, sqrtpd, ...) have increased latencies, and we shouldn't fully disable V2DFmode vectorization with a "big hammer" approach.
As suggested in the PR, we now increase the cost of problematic insns in ix86_add_stmt_cost. This is enough to prevent vectorization of V2DFmode loops, but we still allow cases where a single problematic insn would prevent vectorization of a complex, mostly integer loop. (BTW: The factor 5 is arbitrary, but based on the factor between latencies of V2DFmode and DFmode insns). 2016-09-20 Uros Bizjak <ubiz...@gmail.com> PR target/77621 * config/i386/i386.c (ix86_preferred_simd_mode) <case DFmode>: Don't return word_mode for !TARGET_VECTORIZE_DOUBLE. (ix86_add_stmt_cost): Penalize DFmode vector operations for !TARGET_VECTORIZE_DOUBLE. testsuite/ChangeLog: 2016-09-20 Uros Bizjak <ubiz...@gmail.com> PR target/77621 * gcc.target/i386/pr77621.c: New test. * gcc.target/i386/vect-double-2.c: Update scan-tree-dump-times pattern, loop should vectorize with -mtune=atom. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros.
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 240263) +++ config/i386/i386.c (working copy) @@ -49554,9 +49554,7 @@ ix86_preferred_simd_mode (machine_mode mode) return V4SFmode; case DFmode: - if (!TARGET_VECTORIZE_DOUBLE) - return word_mode; - else if (TARGET_AVX512F) + if (TARGET_AVX512F) return V8DFmode; else if (TARGET_AVX && !TARGET_PREFER_AVX128) return V4DFmode; @@ -49647,9 +49645,14 @@ ix86_add_stmt_cost (void *data, int count, enum ve tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); + /* Penalize DFmode vector operations for !TARGET_VECTORIZE_DOUBLE. */ + if (kind == vector_stmt && !TARGET_VECTORIZE_DOUBLE + && vectype && GET_MODE_INNER (TYPE_MODE (vectype)) == DFmode) + stmt_cost *= 5; /* FIXME: The value here is arbitrary. */ + /* Statements in an inner loop relative to the loop being vectorized are weighted more heavily. The value here is - arbitrary and could potentially be improved with analysis. */ + arbitrary and could potentially be improved with analysis. */ if (where == vect_body && stmt_info && stmt_in_inner_loop_p (stmt_info)) count *= 50; /* FIXME. */ Index: testsuite/gcc.target/i386/pr77621.c =================================================================== --- testsuite/gcc.target/i386/pr77621.c (nonexistent) +++ testsuite/gcc.target/i386/pr77621.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mtune=atom -msse2 -fdump-tree-vect-stats" } */ + +void +foo (double *x, int *y) +{ + int i; + for (i = 0; i < 8; i++) + x[i] -= y[i] * x[i + 1]; +} + +/* { dg-final { scan-tree-dump-not "Vectorized loops: 1" "vect" } } */ Index: testsuite/gcc.target/i386/vect-double-2.c =================================================================== --- testsuite/gcc.target/i386/vect-double-2.c (revision 240263) +++ testsuite/gcc.target/i386/vect-double-2.c (working copy) @@ -31,4 +31,4 @@ sse2_test (void) } } -/* { dg-final { scan-tree-dump-not "vectorized 1 loops" "vect" } } */ +/* { dg-final { scan-tree-dump-times "Vectorized loops: 1" 1 "vect" } } */