On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
optimizer wouldn't simplify if op0 has different mode with M, but that's too
restrict which will prevent below optimization, the condition can be relaxed
to op0 must have same inner mode with M.
(set (reg:V2DF 87 [ xx ])
(vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
(parallel [
(const_int 2 [0x2])
]))
(vec_select:DF (reg:V4DF 92)
(parallel [
(const_int 3 [0x3])
]))))
Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* simplify-rtx.c
(simplify_context::simplify_binary_operation_1): Relax
condition of simplifying (vec_concat:M (vec_select op0
index0)(vec_select op1 index1)) to allow different modes
between op0 and M, but have same inner mode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vect-rebuild.c:
* gcc.target/i386/avx512f-vect-rebuild.c: New test.
Funny, I was looking at something rather similar recently, but never
pushed on it because we were going to need too many entries in the
parallel selector.
I'm not convinced that we need the inner mode to match anything. As
long as the vec_concat's mode is twice the size of the vec_select modes
and the vec_select mode is <= the mode of its operands ISTM this is
fine. We might want the modes of the vec_select to match, but I don't
think that's strictly necessary either, they just need to be the same
size. ie, we could have somethig like
(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
I'm not sure if that level of generality is useful though. If we want
the modes of the vec_selects to match I think we could still support
(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
Thoughts?
jeff
Jeff
---
gcc/simplify-rtx.c | 3 ++-
.../gcc.target/i386/avx512f-vect-rebuild.c | 21 +++++++++++++++++++
gcc/testsuite/gcc.target/i386/vect-rebuild.c | 2 +-
3 files changed, 24 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ebad5cb5a79..16286befd79 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code
code,
if (GET_CODE (trueop0) == VEC_SELECT
&& GET_CODE (trueop1) == VEC_SELECT
&& rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
- && GET_MODE (XEXP (trueop0, 0)) == mode)
+ && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
+ == GET_MODE_INNER(mode))
{
rtx par0 = XEXP (trueop0, 1);
rtx par1 = XEXP (trueop1, 1);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
new file mode 100644
index 00000000000..aef6855aa46
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
+
+typedef double v2df __attribute__ ((__vector_size__ (16)));
+typedef double v4df __attribute__ ((__vector_size__ (32)));
+
+v2df h (v4df x)
+{
+ v2df xx = { x[2], x[3] };
+ return xx;
+}
+
+v4df f2 (v4df x)
+{
+ v4df xx = { x[0], x[1], x[2], x[3] };
+ return xx;
+}
+
+/* { dg-final { scan-assembler-not "unpck" } } */
+/* { dg-final { scan-assembler-not "valign" } } */
+/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } }
*/
diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
index 570967f6b5c..8e85b98bf1d 100644
--- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
+++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
@@ -30,4 +30,4 @@ v2df h (v4df x)
/* { dg-final { scan-assembler-not "unpck" } } */
/* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
-/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
+/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } }
*/