On 1 May 2018 at 05:05, Sameera Deshpande <sameera.deshpa...@linaro.org> wrote:
> On 13 April 2018 at 20:21, James Greenhalgh <james.greenha...@arm.com> wrote:
>> On Fri, Apr 13, 2018 at 03:39:32PM +0100, Sameera Deshpande wrote:
>>> On Fri 13 Apr, 2018, 8:04 PM James Greenhalgh, 
>>> <james.greenha...@arm.com<mailto:james.greenha...@arm.com>> wrote:
>>> On Fri, Apr 06, 2018 at 08:55:47PM +0100, Christophe Lyon wrote:
>>> > Hi,
>>> >
>>> > 2018-04-06 12:15 GMT+02:00 Sameera Deshpande 
>>> > <sameera.deshpa...@linaro.org<mailto:sameera.deshpa...@linaro.org>>:
>>> > > Hi Christophe,
>>> > >
>>> > > Please find attached the updated patch with testcases.
>>> > >
>>> > > Ok for trunk?
>>> >
>>> > Thanks for the update.
>>> >
>>> > Since the new intrinsics are only available on aarch64, you want to
>>> > prevent the tests from running on arm.
>>> > Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two 
>>> > targets.
>>> > There are several examples on how to do that in that directory.
>>> >
>>> > I have also noticed that the tests fail at execution on aarch64_be.
>>>
>>> I think this is important to fix. We don't want the big-endian target to 
>>> have
>>> failing implementations of the Neon intrinsics. What is the nature of the
>>> failure?
>>>
>>> From what I can see, nothing in the patch prevents using these intrinsics
>>> on big-endian, so either the intrinsics behaviour is wrong (we have a wrong
>>> code bug), or the testcase expected behaviour is wrong.
>>>
>>> I don't think disabling the test for big-endian is the right fix. We should
>>> either fix the intrinsics, or fix the testcase.
>>>
>>> Thanks,
>>> James
>>>
>>> Hi James,
>>>
>>> As the tests assume the little endian order of elements while checking the
>>> results, the tests are failing for big endian targets. So, the failures are
>>> not because of intrinsic implementations, but because of the testcase.
>>
>> The testcase is a little hard to follow through the macros, but why would
>> this be the case?
>>
>> ld1 is deterministic on big and little endian for which elements will be
>> loaded from memory, as is st1.
>>
>> My expectation would be that:
>>
>>   int __attribute__ ((noinline))
>>   test_vld_u16_x3 ()
>>   {
>>     uint16_t data[3 * 3];
>>     uint16_t temp[3 * 3];
>>     uint16x4x3_t vectors;
>>     int i,j;
>>     for (i = 0; i < 3 * 3; i++)
>>       data [i] = (uint16_t) 3*i;
>>     asm volatile ("" : : : "memory");
>>     vectors = vld1_u16_x3 (data);
>>     vst1_u16 (temp, vectors.val[0]);
>>     vst1_u16 (&temp[3], vectors.val[1]);
>>     vst1_u16 (&temp[3 * 2], vectors.val[2]);
>>     asm volatile ("" : : : "memory");
>>     for (j = 0; j < 3 * 3; j++)
>>       if (temp[j] != data[j])
>>         return 1;
>>     return 0;
>>   }
>>
>> would work equally well for big- or little-endian.
>>
>> I think this is more likely to be an intrinsics implementation bug.
>>
>> Thanks,
>> James
>>
>
> Hi James,
>
> Please find attached the updated patch, which now passes for little as
> well as big endian.
> Ok for trunk?
>
> --
> - Thanks and regards,
>   Sameera D.
>
> gcc/Changelog:
>
> 2018-05-01  Sameera Deshpande  <sameera.deshpa...@linaro.org>
>
>
>         * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
>         (st1x2): Likewise.
>         (st1x3): Likewise.
>         * config/aarch64/aarch64-simd.md
>         (aarch64_ld1x3<VALLDIF:mode>): New pattern.
>         (aarch64_ld1_x3_<mode>): Likewise
>         (aarch64_st1x2<VALLDIF:mode>): Likewise
>         (aarch64_st1_x2_<mode>): Likewise
>         (aarch64_st1x3<VALLDIF:mode>): Likewise
>         (aarch64_st1_x3_<mode>): Likewise
>         * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
>         (vld1_s8_x3): Likewise.
>         (vld1_u16_x3): Likewise.
>         (vld1_s16_x3): Likewise.
>         (vld1_u32_x3): Likewise.
>         (vld1_s32_x3): Likewise.
>         (vld1_u64_x3): Likewise.
>         (vld1_s64_x3): Likewise.
>         (vld1_f16_x3): Likewise.
>         (vld1_f32_x3): Likewise.
>         (vld1_f64_x3): Likewise.
>         (vld1_p8_x3): Likewise.
>         (vld1_p16_x3): Likewise.
>         (vld1_p64_x3): Likewise.
>         (vld1q_u8_x3): Likewise.
>         (vld1q_s8_x3): Likewise.
>         (vld1q_u16_x3): Likewise.
>         (vld1q_s16_x3): Likewise.
>         (vld1q_u32_x3): Likewise.
>         (vld1q_s32_x3): Likewise.
>         (vld1q_u64_x3): Likewise.
>         (vld1q_s64_x3): Likewise.
>         (vld1q_f16_x3): Likewise.
>         (vld1q_f32_x3): Likewise.
>         (vld1q_f64_x3): Likewise.
>         (vld1q_p8_x3): Likewise.
>         (vld1q_p16_x3): Likewise.
>         (vld1q_p64_x3): Likewise.
>         (vst1_s64_x2): Likewise.
>         (vst1_u64_x2): Likewise.
>         (vst1_f64_x2): Likewise.
>         (vst1_s8_x2): Likewise.
>         (vst1_p8_x2): Likewise.
>         (vst1_s16_x2): Likewise.
>         (vst1_p16_x2): Likewise.
>         (vst1_s32_x2): Likewise.
>         (vst1_u8_x2): Likewise.
>         (vst1_u16_x2): Likewise.
>         (vst1_u32_x2): Likewise.
>         (vst1_f16_x2): Likewise.
>         (vst1_f32_x2): Likewise.
>         (vst1_p64_x2): Likewise.
>         (vst1q_s8_x2): Likewise.
>         (vst1q_p8_x2): Likewise.
>         (vst1q_s16_x2): Likewise.
>         (vst1q_p16_x2): Likewise.
>         (vst1q_s32_x2): Likewise.
>         (vst1q_s64_x2): Likewise.
>         (vst1q_u8_x2): Likewise.
>         (vst1q_u16_x2): Likewise.
>         (vst1q_u32_x2): Likewise.
>         (vst1q_u64_x2): Likewise.
>         (vst1q_f16_x2): Likewise.
>         (vst1q_f32_x2): Likewise.
>         (vst1q_f64_x2): Likewise.
>         (vst1q_p64_x2): Likewise.
>         (vst1_s64_x3): Likewise.
>         (vst1_u64_x3): Likewise.
>         (vst1_f64_x3): Likewise.
>         (vst1_s8_x3): Likewise.
>         (vst1_p8_x3): Likewise.
>         (vst1_s16_x3): Likewise.
>         (vst1_p16_x3): Likewise.
>         (vst1_s32_x3): Likewise.
>         (vst1_u8_x3): Likewise.
>         (vst1_u16_x3): Likewise.
>         (vst1_u32_x3): Likewise.
>         (vst1_f16_x3): Likewise.
>         (vst1_f32_x3): Likewise.
>         (vst1_p64_x3): Likewise.
>         (vst1q_s8_x3): Likewise.
>         (vst1q_p8_x3): Likewise.
>         (vst1q_s16_x3): Likewise.
>         (vst1q_p16_x3): Likewise.
>         (vst1q_s32_x3): Likewise.
>         (vst1q_s64_x3): Likewise.
>         (vst1q_u8_x3): Likewise.
>         (vst1q_u16_x3): Likewise.
>         (vst1q_u32_x3): Likewise.
>         (vst1q_u64_x3): Likewise.
>         (vst1q_f16_x3): Likewise.
>         (vst1q_f32_x3): Likewise.
>         (vst1q_f64_x3): Likewise.
>         (vst1q_p64_x3): Likewise.
>
> gcc/testsuite/Changelog:
>
> 2018-05-01  Sameera Deshpande  <sameera.deshpa...@linaro.org>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: New test for
> vld1x3 intrinsics for aarch64.
>         * gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: New test for
> vst1x2 intrinsics for aarch64.
>         * gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: New test for
> vst1x3 intrinsics for aarch64.

Gentle reminder!

-- 
- Thanks and regards,
  Sameera D.

Reply via email to