Hi!

The following two testcases FAIL to be vectorized, because SSE2 doesn't have
many permutation instructions and the one that actually works (whole vector
shifts) aren't enabled for the V4SFmode.

The following patch fixes it by enabling those optabs also for V4SFmode (and
V2DFmode).  Strictly speaking, we need it only for the VI_128 modes plus
V4SFmode, but I'm not sure it is worth adding yet another iterator for
VI_128 + V4SF and the instructions actually do work for V2DFmode too, just
there are also other permutation instructions that handle V2DFmode.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-08-28  Jakub Jelinek  <ja...@redhat.com>

        PR libgomp/91530
        * config/i386/sse.md (vec_shl_<mode>, vec_shr_<mode>): Use
        V_128 iterator instead of VI_128.

        * testsuite/libgomp.c/scan-21.c: New test.
        * testsuite/libgomp.c/scan-22.c: New test.

--- gcc/config/i386/sse.md.jj   2019-08-27 12:26:25.385089103 +0200
+++ gcc/config/i386/sse.md      2019-08-27 13:50:42.594849445 +0200
@@ -12047,9 +12047,9 @@ (define_insn "<shift_insn><mode>3<mask_n
 (define_expand "vec_shl_<mode>"
   [(set (match_dup 3)
        (ashift:V1TI
-        (match_operand:VI_128 1 "register_operand")
+        (match_operand:V_128 1 "register_operand")
         (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
-   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
+   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
   "TARGET_SSE2"
 {
   operands[1] = gen_lowpart (V1TImode, operands[1]);
@@ -12060,9 +12060,9 @@ (define_expand "vec_shl_<mode>"
 (define_expand "vec_shr_<mode>"
   [(set (match_dup 3)
        (lshiftrt:V1TI
-        (match_operand:VI_128 1 "register_operand")
+        (match_operand:V_128 1 "register_operand")
         (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
-   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
+   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
   "TARGET_SSE2"
 {
   operands[1] = gen_lowpart (V1TImode, operands[1]);
--- libgomp/testsuite/libgomp.c/scan-21.c.jj    2019-08-27 22:56:03.805127837 
+0200
+++ libgomp/testsuite/libgomp.c/scan-21.c       2019-08-27 22:58:26.347043679 
+0200
@@ -0,0 +1,6 @@
+/* { dg-require-effective-target size32plus } */
+/* { dg-require-effective-target avx_runtime } */
+/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 
-mno-sse3" } */
+/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } */
+
+#include "scan-13.c"
--- libgomp/testsuite/libgomp.c/scan-22.c.jj    2019-08-27 22:56:51.034437425 
+0200
+++ libgomp/testsuite/libgomp.c/scan-22.c       2019-08-27 22:59:01.978522645 
+0200
@@ -0,0 +1,6 @@
+/* { dg-require-effective-target size32plus } */
+/* { dg-require-effective-target avx_runtime } */
+/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 
-mno-sse3" } */
+/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } */
+
+#include "scan-17.c"

        Jakub

Reply via email to