The following fixes an issue in the RTL combiner where we correctly
combine two vector sign-exxtends with a vector load

Trying 7, 9 -> 10:
    7: r106:V4QI=[r119:DI]
      REG_DEAD r119:DI
    9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel))
   10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel))
      REG_DEAD r108:V4HI

to

modifying insn i2     9: r109:V4SI=sign_extend([r119:DI])

but since r106 is used we wrongly materialize it using a subreg:

modifying insn i3    10: r106:V4QI=r109:V4SI#0

which of course does not work for modes with more than one component.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Note the check allows subreg:V1QI reg:V1SI (which I think is OK).
There's no SCALAR_MODE_P, maybe the other checks guarantee it's
an integer mode so eventually SCALAR_INT_MODE_P covers everything
important (it wouldn't cover V1QI, not that that's important).

OK?  Or do you prefer a different check - which?

Thanks,
Richard.

        PR rtl-optimization/118662
        * combine.cc (try_combine): When re-materializing a load
        from an extended reg by a lowpart subreg make sure we're
        dealing with single-component modes.

        * gcc.dg/torture/pr118662.c: New testcase.
---
 gcc/combine.cc                          |  5 +++++
 gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a2d4387cebe..4849603ba5e 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
      copy.  This saves at least one insn, more if register allocation can
      eliminate the copy.
 
+     We cannot do this if the involved modes have more than one elements,
+     like for vector or complex modes.
+
      We cannot do this if the destination of the first assignment is a
      condition code register.  We eliminate this case by making sure
      the SET_DEST and SET_SRC have the same mode.
@@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
           && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND
           && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))
               == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0))))
+          && known_eq (GET_MODE_NUNITS
+                         (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0)))), 1)
           && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
           && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)),
                           XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0))
diff --git a/gcc/testsuite/gcc.dg/torture/pr118662.c 
b/gcc/testsuite/gcc.dg/torture/pr118662.c
new file mode 100644
index 00000000000..b9e8cca0aeb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118662.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */
+/* { dg-additional-options "-msse4" { target sse4_runtime} } */
+
+int __attribute__((noipa)) addup(signed char *num) {
+  int val = num[0] + num[1] + num[2] + num[3];
+  if (num[3] >= 0)
+    val++;
+  return val;
+}
+
+int main(int, char *[])
+{
+  signed char num[4] = {1, 1, 1, -1};
+  if (addup(num) != 2)
+    __builtin_abort();
+  return 0;
+}
-- 
2.43.0

Reply via email to