TRN for bigendian

Alan Lawrence Mon, 09 Jun 2014 03:08:35 -0700

Baking on trunk as of rev 211369.

--Alan


Ramana Radhakrishnan wrote:

On Wed, May 14, 2014 at 2:52 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:

Hi,

Due to differences in how the ARM C Language Extensions and gcc's vector
extensions deal with indices within vectors, the __builtin_shuffle masks
used to implement the ZIP, UZP and TRN Neon Intrinsics in arm_neon.h are
correct only for little-endian. (The problem on bigendian has recently been
revealed by new tests in gcc.target/arm/simd/.)

This patch corrects the indices using "#ifdef __ARM_BIG_ENDIAN" through
arm_neon.h. I've tested all the arm-specific tests (arm.exp acle.exp
aapcs.exp simd.exp neon.exp) on both arm-none-eabi and armeb-none-eabi and
there are no regressions, and on armeb-none-eabi this patch fixes FAIL ->
PASS for simd/v{uzp,zip,trn}*_1.c.

Note the patch also modifies gcc.target/arm/pr48252.c. A bit of diving into
the history of this test reveals
   *the test was first written in the days when the arm_neon.h
implementation used builtins such as __builtin_neon_vzipv8qi (which were
thus correct for bigendian).
   *In SVN rev 189294, ZIP intrinsics were rewritten to use
__builtin_shuffle (with little-endian masks); this broke pr48252.c on
bigendian, but this was not detected until...
   *In SVN rev 191200, in which pr48252.c was modified to expect different
results according to endianness - that is, "fixing" the test to match the
broken implementation. (I have verified that this updated test failed on the
original __builtin_neon_vzipv8qi implementation.)


Thanks for fixing this up - This patch is ok. Please apply to trunk
and backport to 4.9 after letting it bake on trunk for a few days.

I suspect 4.8 is also afflicted from a quick look - can you please
double check and then apply.

If this goes in we need to kill the ml for 4.8 and 4.9 now.

regards
Ramana

The fix to pr48252.c here largely reverts to the original form although
keeps the (correct, proper) use of vld1.

gcc/ChangeLog:

        * config/arm/arm_neon.h (vtrn_s8, vtrn_s16, vtrn_u8, vtrn_u16,
vtrn_p8,
        vtrn_p16, vtrn_s32, vtrn_f32, vtrn_u32, vtrnq_s8, vtrnq_s16,
vtrnq_s32,
        vtrnq_f32, vtrnq_u8, vtrnq_u16, vtrnq_u32, vtrnq_p8, vtrnq_p16,
vzip_s8,
        vzip_s16, vzip_u8, vzip_u16, vzip_p8, vzip_p16, vzip_s32, vzip_f32,
        vzip_u32, vzipq_s8, vzipq_s16, vzipq_s32, vzipq_f32, vzipq_u8,
        vzipq_u16, vzipq_u32, vzipq_p8, vzipq_p16, vuzp_s8, vuzp_s16,
vuzp_s32,
        vuzp_f32, vuzp_u8, vuzp_u16, vuzp_u32, vuzp_p8, vuzp_p16, vuzpq_s8,
        vuzpq_s16, vuzpq_s32, vuzpq_f32, vuzpq_u8, vuzpq_u16, vuzpq_u32,
        vuzpq_p8, vuzpq_p16): Correct mask for bigendian.

gcc/testsuite/ChangeLog:

        * gcc.target/arm/pr48252.c (main): Expect same result as
endian-neutral.

Cheers, Alan

Re: [PATCH ARM] PR/61062 Fix arm_neon.h ZIP/UZP/TRN for bigendian

Reply via email to