Hi

This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP
and VZIP on armeb-* targets. It also fixes the definition of the
vuzpq_* and vzipq_*  NEON intrinsics which use incorrect lane
specifiers in the use of __builtin_shuffle().

The problem with arm_neon.h can be seen by temporarily altering
arm_expand_vec_perm_const_1() to unconditionally return false. If this
is done, the vuzp/vzip tests in the advsimd execution tests will fail.
With these patches, this is no longer the case.

The problem is caused by the weird mapping of architectural lane order
to gcc lane order in big endian. For 64 bit vectors, the order is
simply reversed, but 128 bit vectors are treated as 2 64 bit vectors
where the lane ordering is reversed inside those. This is due to the
memory ordering defined by the EABI. There is a large comment in
gcc/config/arm.c above output_move_neon() which describes this in more
detail.

The arm_evpc_neon_vuzp() and  arm_evpc_neon_vzip() functions do not
allow for this lane order, instead treating the lane order as simply
reversed in 128 bit vectors. These patches fix this. I have included a
test case for vuzp, but I don't have one for vzip.

Tested with make check on arm-unknown-linux-gnueabihf with no regressions
Tested with make check on armeb-unknown-linux-gnueabihf. Some
gcc.dg/vect tests fail due to no longer being vectorized. I haven't
analysed these, but it is expected since vuzp is not usable for the
shuffle patterns for which it was previously used. There are also a
few new PASSes.


Patch 1 (vuzp):

gcc/ChangeLog:

2015-12-15  Charles Baylis  <charles.bay...@linaro.org>

        * config/arm/arm.c (arm_neon_endian_lane_map): New function.
        (arm_neon_vector_pair_endian_lane_map): New function.
        (arm_evpc_neon_vuzp): Allow for big endian lane order.
        * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
        endian.
        (vuzpq_s16): Likewise.
        (vuzpq_s32): Likewise.
        (vuzpq_f32): Likewise.
        (vuzpq_u8): Likewise.
        (vuzpq_u16): Likewise.
        (vuzpq_u32): Likewise.
        (vuzpq_p8): Likewise.
        (vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2015-12-15  Charles Baylis  <charles.bay...@linaro.org>

        * gcc.c-torture/execute/pr68532.c: New test.


Patch 2 (vzip)

gcc/ChangeLog:

2015-12-15  Charles Baylis  <charles.bay...@linaro.org>

        * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
        order.
        * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
        endian.
        (vzipq_s16): Likewise.
        (vzipq_s32): Likewise.
        (vzipq_f32): Likewise.
        (vzipq_u8): Likewise.
        (vzipq_u16): Likewise.
        (vzipq_u32): Likewise.
        (vzipq_p8): Likewise.
        (vzipq_p16): Likewise.

Attachment: 0001-ARM-Fix-up-vuzp-for-big-endian.patch
Description: application/download

Attachment: 0002-ARM-Fix-up-vzip-recognition-for-big-endian.patch
Description: application/download

Reply via email to