On 26 January 2015 at 13:10, Tejas Belagod <tejas.bela...@arm.com> wrote:
> On 25/01/15 21:05, Christophe Lyon wrote:
>>
>> On 23 January 2015 at 14:44, Christophe Lyon <christophe.l...@linaro.org>
>> wrote:
>>>
>>> On 23 January 2015 at 12:42, Christophe Lyon <christophe.l...@linaro.org>
>>> wrote:
>>>>
>>>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.bela...@arm.com>
>>>> wrote:
>>>>>
>>>>> On 22/01/15 21:31, Christophe Lyon wrote:
>>>>>>
>>>>>>
>>>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.bela...@arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 22/01/15 14:28, Christophe Lyon wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.bela...@arm.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft
>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon
>>>>>>>>>>> <christophe.l...@linaro.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft
>>>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon
>>>>>>>>>>>>> <christophe.l...@linaro.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t
>>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> vmin
>>>>>>>>>>>>>> when the input is -NaN.
>>>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the
>>>>>>>>>>>>>> result
>>>>>>>>>>>>>> should the same as on aarch32.
>>>>>>>>>>>>>> I haven't had time to look at it in more details though.
>>>>>>>>>>>>>> So, not OK?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did
>>>>>>>>>>>>> you
>>>>>>>>>>>>> test on HW or a model?
>>>>>>>>>>>>>
>>>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on
>>>>>>>>>>>> the
>>>>>>>>>>>> foundation model for aarch64*-elf.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Leave this one out until we understand why it fails. /Marcus
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I've looked at this a bit more.
>>>>>>>>>> We have
>>>>>>>>>> fmax    v0.4s, v0.4s, v1.4s
>>>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1.
>>>>>>>>>>
>>>>>>>>>> The output is still -NaN (0xffc00000), while the test expects
>>>>>>>>>> defaultNaN (0x7fc00000).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always
>>>>>>>>> uses
>>>>>>>>> the
>>>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In
>>>>>>>>> AArch64
>>>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations
>>>>>>>>> depend
>>>>>>>>> on
>>>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or
>>>>>>>>> generate
>>>>>>>>> DefaultNaN depending on the value of DN.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I
>>>>>>>> can see only the latter (no diff between aarch32 and aarch64 in
>>>>>>>> FPProcessNan pseudo-code)
>>>>>>>>
>>>>>>>
>>>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec
>>>>>>> -
>>>>>>> under DN:
>>>>>>>
>>>>>>> "The value of this bit only controls scalar floating-point
>>>>>>> arithmetic.
>>>>>>> Advanced SIMD arithmetic always uses the Default NaN setting,
>>>>>>> regardless
>>>>>>> of
>>>>>>> the value of the DN bit."
>>>>>>>
>>>>>>> Also on page 3180 for the description of VMAX(vector FP), it says:
>>>>>>> "
>>>>>>> *  max(+0.0, -0.0) = +0.0
>>>>>>> * If any input is a NaN, the corresponding result element is the
>>>>>>> default
>>>>>>> NaN.
>>>>>>> "
>>>>>>>
>>>>>> Oops I was looking at FMAX (vector) pg 936.
>>>>>>
>>>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue()
>>>>>>> to
>>>>>>> FPMax() which is on pg. 2285
>>>>>>>
>>>>>>> // StandardFPSCRValue()
>>>>>>> // ====================
>>>>>>> FPCRType StandardFPSCRValue()
>>>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’
>>>>>>>
>>>>>>> Here bit-25(FPSCR.DN) is set to 1.
>>>>>>>
>>>>>>
>>>>>> So, we should get defaultNaN too on aarch64, and no need to try to
>>>>>> force DN to 1 in gdb?
>>>>>>
>>>>>> What can be wrong?
>>>>>>
>>>>>
>>>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're
>>>>> reading
>>>>> the same document.
>>>>>
>>>>> Regardless of the page number, if you see the pseudocode for
>>>>> VMAX(FPSIMD)
>>>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax()
>>>>> which
>>>>> means generate DefaultNaN() regardless.
>>>>>
>>>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the
>>>>> pseudocode gets just FPCR.
>>>>>
>>>>>
>>>> Ok, that was my initial understanding but our discussion confused me.
>>>>
>>>> And that's why I tried to force DN = 1 in gdb before single-stepping
>>>> over
>>>> fmax    v0.4s, v0.4s, v1.4s
>>>>
>>>> but it changed nothing :-(
>>>> Hence my question about a gdb possible bug or misuse.
>>>
>>>
>>> Hmm... user error, I missed one bit
>>> set $fpcr=0x2000000
>>> works under gdb.
>>>
>>>> I'll try modifying the test to have it force DN=1.
>>>>
>>> Forcing DN=1 in the test makes it pass.
>>>
>>> I am going to look at adding that cleanly to my test, and resubmit it.
>>>
>>> Thanks, and sorry for the noise.
>>>
>> Here is the updated version:
>> - Now I set DN=1 on AArch64 in clean_results, as it is the main
>> initialization function.
>> - I removed the double negative :-)
>> - I removed the useless [u]int64 and poly variants
>>
>> Christophe.
>>
>> 2015-01-25  Christophe Lyon  <christophe.l...@linaro.org>
>>
>> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
>> (_ARM_FPSRC): Add DN and AHP fields.
>> (clean_results): Force DN=1 on AArch64.
>> * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file.
>>
>
> I guess you don't need the fake dependency fix for this as this is mostly
> called only once?
>
Yes, that is my current assumption: for the time being there is no
other code which can potentially change this value.

> +  _ARM_FPSCR _afpscr_for_dn;
> +  asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn));
> +  _afpscr_for_dn.b.DN = 1;
> +  asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn));

Maybe in the future we'll want to check that DN=0 means that we
actually forward a NaN != DefaultNaN, but that can be a further
improvement to this patch.

> Otherwise, your patch looks OK to me(but I can't approve it).
Thanks for the review.

> Thanks,
> Tejas.
>
>

Reply via email to