Hi,

When a shuffle of more than one input happens, on NEON we end up with a 'mixed-endian' format in the register list which TBL operates on. We don't make this correction in RTL and therefore the shuffle operation gets it incorrect. Here is a patch that fixes-up the index table in the selector rtx in RTL to also be mixed-endian to reflect what's happening on NEON.

As trunk stands, this patch will not be exercised as constant vector permute for Big-endian is disabled. I've tested this by locally enabling const vec_perm and it fixes the some regressions we have on big-endian:

aarch64_be-none-elf:
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-loops
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-114.c execution test
FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-15.c execution test

Also regressed on aarch64-none-elf.

OK for stage-1?

Thanks,
Tejas.

2014-02-21  Tejas Belagod  <tejas.bela...@arm.com>

gcc/
        * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
        big-endian when dealing with more than one input shuffle vector.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea90311..fd473a3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
     return false;

   for (i = 0; i < nelt; ++i)
-    rperm[i] = GEN_INT (d->perm[i]);
+    {
+      int nunits = GET_MODE_NUNITS (vmode);
+      int elt = d->perm[i];
+
+      /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
+      if (BYTES_BIG_ENDIAN)
+       {
+         if (!d->one_vector_p && d->perm[i] & nunits)
+           {
+             /* Extract the offset.  */
+             elt = d->perm[i] & (nunits - 1);
+             /* Reverse the top half.  */
+             elt = nunits - 1 - elt;
+             /* Offset it by the bottom half.  */
+             elt += nunits;
+           }
+         else
+           elt = nunits - 1 - d->perm[i];
+       }
+
+      rperm[i] = GEN_INT (elt);
+    }
   sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
   sel = force_reg (vmode, sel);

Reply via email to