Hi,
Due to differences in how the ARM C Language Extensions and gcc's vector
extensions deal with indices within vectors, the __builtin_shuffle masks
used to implement the ZIP, UZP and TRN Neon Intrinsics in arm_neon.h are
correct only for little-endian. (The problem on bigendian has recently been
revealed by new tests in gcc.target/arm/simd/.)
This patch corrects the indices using "#ifdef __ARM_BIG_ENDIAN" through
arm_neon.h. I've tested all the arm-specific tests (arm.exp acle.exp
aapcs.exp simd.exp neon.exp) on both arm-none-eabi and armeb-none-eabi and
there are no regressions, and on armeb-none-eabi this patch fixes FAIL ->
PASS for simd/v{uzp,zip,trn}*_1.c.
Note the patch also modifies gcc.target/arm/pr48252.c. A bit of diving into
the history of this test reveals
*the test was first written in the days when the arm_neon.h
implementation used builtins such as __builtin_neon_vzipv8qi (which were
thus correct for bigendian).
*In SVN rev 189294, ZIP intrinsics were rewritten to use
__builtin_shuffle (with little-endian masks); this broke pr48252.c on
bigendian, but this was not detected until...
*In SVN rev 191200, in which pr48252.c was modified to expect different
results according to endianness - that is, "fixing" the test to match the
broken implementation. (I have verified that this updated test failed on the
original __builtin_neon_vzipv8qi implementation.)
The fix to pr48252.c here largely reverts to the original form although
keeps the (correct, proper) use of vld1.
gcc/ChangeLog:
* config/arm/arm_neon.h (vtrn_s8, vtrn_s16, vtrn_u8, vtrn_u16,
vtrn_p8,
vtrn_p16, vtrn_s32, vtrn_f32, vtrn_u32, vtrnq_s8, vtrnq_s16,
vtrnq_s32,
vtrnq_f32, vtrnq_u8, vtrnq_u16, vtrnq_u32, vtrnq_p8, vtrnq_p16,
vzip_s8,
vzip_s16, vzip_u8, vzip_u16, vzip_p8, vzip_p16, vzip_s32, vzip_f32,
vzip_u32, vzipq_s8, vzipq_s16, vzipq_s32, vzipq_f32, vzipq_u8,
vzipq_u16, vzipq_u32, vzipq_p8, vzipq_p16, vuzp_s8, vuzp_s16,
vuzp_s32,
vuzp_f32, vuzp_u8, vuzp_u16, vuzp_u32, vuzp_p8, vuzp_p16, vuzpq_s8,
vuzpq_s16, vuzpq_s32, vuzpq_f32, vuzpq_u8, vuzpq_u16, vuzpq_u32,
vuzpq_p8, vuzpq_p16): Correct mask for bigendian.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr48252.c (main): Expect same result as
endian-neutral.
Cheers, Alan