On Wed, May 14, 2014 at 2:52 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > Hi, > > Due to differences in how the ARM C Language Extensions and gcc's vector > extensions deal with indices within vectors, the __builtin_shuffle masks > used to implement the ZIP, UZP and TRN Neon Intrinsics in arm_neon.h are > correct only for little-endian. (The problem on bigendian has recently been > revealed by new tests in gcc.target/arm/simd/.) > > This patch corrects the indices using "#ifdef __ARM_BIG_ENDIAN" through > arm_neon.h. I've tested all the arm-specific tests (arm.exp acle.exp > aapcs.exp simd.exp neon.exp) on both arm-none-eabi and armeb-none-eabi and > there are no regressions, and on armeb-none-eabi this patch fixes FAIL -> > PASS for simd/v{uzp,zip,trn}*_1.c. > > Note the patch also modifies gcc.target/arm/pr48252.c. A bit of diving into > the history of this test reveals > *the test was first written in the days when the arm_neon.h > implementation used builtins such as __builtin_neon_vzipv8qi (which were > thus correct for bigendian). > *In SVN rev 189294, ZIP intrinsics were rewritten to use > __builtin_shuffle (with little-endian masks); this broke pr48252.c on > bigendian, but this was not detected until... > *In SVN rev 191200, in which pr48252.c was modified to expect different > results according to endianness - that is, "fixing" the test to match the > broken implementation. (I have verified that this updated test failed on the > original __builtin_neon_vzipv8qi implementation.)
Thanks for fixing this up - This patch is ok. Please apply to trunk and backport to 4.9 after letting it bake on trunk for a few days. I suspect 4.8 is also afflicted from a quick look - can you please double check and then apply. If this goes in we need to kill the ml for 4.8 and 4.9 now. regards Ramana > > The fix to pr48252.c here largely reverts to the original form although > keeps the (correct, proper) use of vld1. > > gcc/ChangeLog: > > * config/arm/arm_neon.h (vtrn_s8, vtrn_s16, vtrn_u8, vtrn_u16, > vtrn_p8, > vtrn_p16, vtrn_s32, vtrn_f32, vtrn_u32, vtrnq_s8, vtrnq_s16, > vtrnq_s32, > vtrnq_f32, vtrnq_u8, vtrnq_u16, vtrnq_u32, vtrnq_p8, vtrnq_p16, > vzip_s8, > vzip_s16, vzip_u8, vzip_u16, vzip_p8, vzip_p16, vzip_s32, vzip_f32, > vzip_u32, vzipq_s8, vzipq_s16, vzipq_s32, vzipq_f32, vzipq_u8, > vzipq_u16, vzipq_u32, vzipq_p8, vzipq_p16, vuzp_s8, vuzp_s16, > vuzp_s32, > vuzp_f32, vuzp_u8, vuzp_u16, vuzp_u32, vuzp_p8, vuzp_p16, vuzpq_s8, > vuzpq_s16, vuzpq_s32, vuzpq_f32, vuzpq_u8, vuzpq_u16, vuzpq_u32, > vuzpq_p8, vuzpq_p16): Correct mask for bigendian. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/pr48252.c (main): Expect same result as > endian-neutral. > > Cheers, Alan