Hi This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP and VZIP on armeb-* targets. It also fixes the definition of the vuzpq_* and vzipq_* NEON intrinsics which use incorrect lane specifiers in the use of __builtin_shuffle().
The problem with arm_neon.h can be seen by temporarily altering arm_expand_vec_perm_const_1() to unconditionally return false. If this is done, the vuzp/vzip tests in the advsimd execution tests will fail. With these patches, this is no longer the case. The problem is caused by the weird mapping of architectural lane order to gcc lane order in big endian. For 64 bit vectors, the order is simply reversed, but 128 bit vectors are treated as 2 64 bit vectors where the lane ordering is reversed inside those. This is due to the memory ordering defined by the EABI. There is a large comment in gcc/config/arm.c above output_move_neon() which describes this in more detail. The arm_evpc_neon_vuzp() and arm_evpc_neon_vzip() functions do not allow for this lane order, instead treating the lane order as simply reversed in 128 bit vectors. These patches fix this. I have included a test case for vuzp, but I don't have one for vzip. Tested with make check on arm-unknown-linux-gnueabihf with no regressions Tested with make check on armeb-unknown-linux-gnueabihf. Some gcc.dg/vect tests fail due to no longer being vectorized. I haven't analysed these, but it is expected since vuzp is not usable for the shuffle patterns for which it was previously used. There are also a few new PASSes. Patch 1 (vuzp): gcc/ChangeLog: 2015-12-15 Charles Baylis <charles.bay...@linaro.org> * config/arm/arm.c (arm_neon_endian_lane_map): New function. (arm_neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2015-12-15 Charles Baylis <charles.bay...@linaro.org> * gcc.c-torture/execute/pr68532.c: New test. Patch 2 (vzip) gcc/ChangeLog: 2015-12-15 Charles Baylis <charles.bay...@linaro.org> * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane order. * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big endian. (vzipq_s16): Likewise. (vzipq_s32): Likewise. (vzipq_f32): Likewise. (vzipq_u8): Likewise. (vzipq_u16): Likewise. (vzipq_u32): Likewise. (vzipq_p8): Likewise. (vzipq_p16): Likewise.
0001-ARM-Fix-up-vuzp-for-big-endian.patch
Description: application/download
0002-ARM-Fix-up-vzip-recognition-for-big-endian.patch
Description: application/download