Hi,
When a shuffle of more than one input happens, on NEON we end up with a 'mixed-endian' format in the register list which TBL operates on. We don't make this correction in RTL and therefore the shuffle operation gets it incorrect. Here is a patch that fixes-up the index table in the selector rtx in RTL to also be mixed-endian to reflect what's happening on NEON.
As trunk stands, this patch will not be exercised as constant vector permute for Big-endian is disabled. I've tested this by locally enabling const vec_perm and it fixes the some regressions we have on big-endian:
aarch64_be-none-elf: FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointerFAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-loops
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -g FAIL->PASS: gcc.dg/torture/vector-shuffle1.c -O0 execution test FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v2si.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v4si.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c -O2 execution test FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c -O2 execution test FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test FAIL->PASS: gcc.dg/vect/vect-114.c execution test FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test FAIL->PASS: gcc.dg/vect/vect-15.c execution test Also regressed on aarch64-none-elf. OK for stage-1? Thanks, Tejas. 2014-02-21 Tejas Belagod <tejas.bela...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for big-endian when dealing with more than one input shuffle vector.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ea90311..fd473a3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d) return false; for (i = 0; i < nelt; ++i) - rperm[i] = GEN_INT (d->perm[i]); + { + int nunits = GET_MODE_NUNITS (vmode); + int elt = d->perm[i]; + + /* If two vectors, we end up with a wierd mixed-endian mode on NEON. */ + if (BYTES_BIG_ENDIAN) + { + if (!d->one_vector_p && d->perm[i] & nunits) + { + /* Extract the offset. */ + elt = d->perm[i] & (nunits - 1); + /* Reverse the top half. */ + elt = nunits - 1 - elt; + /* Offset it by the bottom half. */ + elt += nunits; + } + else + elt = nunits - 1 - d->perm[i]; + } + + rperm[i] = GEN_INT (elt); + } sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm)); sel = force_reg (vmode, sel);