Hi,

PR91275 observes that __builtin_crypto_vpmsumd fails to work properly
with -O1 or higher with -mcpu=power8.  That combination spells swap
optimization.  Indeed, all vpmsum* instructions were being accepted
as swappable operations.  This is correct for all of them but vpmsumd,
which creates a 128-bit result.

The -std=gnu11 in the testcase is there to avoid errors about long long
not being accepted with pure ANSI.  The "11" part is arbitrary.  The
testcase is modified from the original bug report.

This patch disallows swap optimization in the presence of vpmsumd.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this okay
for trunk, and for backport to all active branches after an appropriate
waiting period?

Thanks,
Bill


[gcc]

2019-09-30  Bill Schmidt  <wschm...@linux.ibm.com>

        * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap
        vpmsumd.

[gcc/testsuite]

2019-09-30  Bill Schmidt  <wschm...@linux.ibm.com>

        * gcc.target/powerpc/pr91275.c: New.


Index: gcc/config/rs6000/rs6000-p8swap.c
===================================================================
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 276360)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -791,6 +791,11 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
          case UNSPEC_REDUC_PLUS:
          case UNSPEC_REDUC:
            return 1;
+         case UNSPEC_VPMSUM:
+           /* vpmsumd is not swappable, but vpmsum[bhw] are.  */
+           if (GET_MODE (op) == V2DImode)
+             return 0;
+           break;
          }
       }
Index: gcc/testsuite/gcc.target/powerpc/pr91275.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr91275.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr91275.c  (working copy)
@@ -0,0 +1,21 @@
+/* Test that we generate vpmsumd correctly without a swap error.  */
+
+/* { dg-do run { target { p8vector_hw } } } */
+/* { dg-options "-O2 -std=gnu11" } */
+
+#include <altivec.h>
+
+int main() {
+
+  const unsigned long long r0l = 0x8e7dfceac070e3a0;
+  vector unsigned long long r0 = (vector unsigned long long) {r0l, 0}, v;
+  const vector unsigned long long pd
+    = (vector unsigned long) {0xc2LLU << 56, 0};
+
+  v = __builtin_crypto_vpmsumd ((vector unsigned long long) {r0[0], 0}, pd);
+
+  if (v[0] != 0x4000000000000000 || v[1] != 0x65bd7ab605a4a8ff)
+    __builtin_abort ();
+
+  return 0;
+}

Reply via email to