On 26 January 2015 at 14:23, Christophe Lyon <christophe.l...@linaro.org> wrote:
> On 26 January 2015 at 13:10, Tejas Belagod <tejas.bela...@arm.com> wrote:
>> On 25/01/15 21:05, Christophe Lyon wrote:
>>>
>>> On 23 January 2015 at 14:44, Christophe Lyon <christophe.l...@linaro.org>
>>> wrote:
>>>>
>>>> On 23 January 2015 at 12:42, Christophe Lyon <christophe.l...@linaro.org>
>>>> wrote:
>>>>>
>>>>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.bela...@arm.com>
>>>>> wrote:
>>>>>>
>>>>>> On 22/01/15 21:31, Christophe Lyon wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.bela...@arm.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22/01/15 14:28, Christophe Lyon wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.bela...@arm.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft
>>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon
>>>>>>>>>>>> <christophe.l...@linaro.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft
>>>>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon
>>>>>>>>>>>>>> <christophe.l...@linaro.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t
>>>>>>>>>>>>>>>> regression
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> vmin
>>>>>>>>>>>>>>> when the input is -NaN.
>>>>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the
>>>>>>>>>>>>>>> result
>>>>>>>>>>>>>>> should the same as on aarch32.
>>>>>>>>>>>>>>> I haven't had time to look at it in more details though.
>>>>>>>>>>>>>>> So, not OK?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> test on HW or a model?
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on
>>>>>>>>>>>>> the
>>>>>>>>>>>>> foundation model for aarch64*-elf.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Leave this one out until we understand why it fails. /Marcus
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've looked at this a bit more.
>>>>>>>>>>> We have
>>>>>>>>>>> fmax    v0.4s, v0.4s, v1.4s
>>>>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1.
>>>>>>>>>>>
>>>>>>>>>>> The output is still -NaN (0xffc00000), while the test expects
>>>>>>>>>>> defaultNaN (0x7fc00000).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always
>>>>>>>>>> uses
>>>>>>>>>> the
>>>>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In
>>>>>>>>>> AArch64
>>>>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations
>>>>>>>>>> depend
>>>>>>>>>> on
>>>>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or
>>>>>>>>>> generate
>>>>>>>>>> DefaultNaN depending on the value of DN.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I
>>>>>>>>> can see only the latter (no diff between aarch32 and aarch64 in
>>>>>>>>> FPProcessNan pseudo-code)
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec
>>>>>>>> -
>>>>>>>> under DN:
>>>>>>>>
>>>>>>>> "The value of this bit only controls scalar floating-point
>>>>>>>> arithmetic.
>>>>>>>> Advanced SIMD arithmetic always uses the Default NaN setting,
>>>>>>>> regardless
>>>>>>>> of
>>>>>>>> the value of the DN bit."
>>>>>>>>
>>>>>>>> Also on page 3180 for the description of VMAX(vector FP), it says:
>>>>>>>> "
>>>>>>>> *  max(+0.0, -0.0) = +0.0
>>>>>>>> * If any input is a NaN, the corresponding result element is the
>>>>>>>> default
>>>>>>>> NaN.
>>>>>>>> "
>>>>>>>>
>>>>>>> Oops I was looking at FMAX (vector) pg 936.
>>>>>>>
>>>>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue()
>>>>>>>> to
>>>>>>>> FPMax() which is on pg. 2285
>>>>>>>>
>>>>>>>> // StandardFPSCRValue()
>>>>>>>> // ====================
>>>>>>>> FPCRType StandardFPSCRValue()
>>>>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’
>>>>>>>>
>>>>>>>> Here bit-25(FPSCR.DN) is set to 1.
>>>>>>>>
>>>>>>>
>>>>>>> So, we should get defaultNaN too on aarch64, and no need to try to
>>>>>>> force DN to 1 in gdb?
>>>>>>>
>>>>>>> What can be wrong?
>>>>>>>
>>>>>>
>>>>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're
>>>>>> reading
>>>>>> the same document.
>>>>>>
>>>>>> Regardless of the page number, if you see the pseudocode for
>>>>>> VMAX(FPSIMD)
>>>>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax()
>>>>>> which
>>>>>> means generate DefaultNaN() regardless.
>>>>>>
>>>>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the
>>>>>> pseudocode gets just FPCR.
>>>>>>
>>>>>>
>>>>> Ok, that was my initial understanding but our discussion confused me.
>>>>>
>>>>> And that's why I tried to force DN = 1 in gdb before single-stepping
>>>>> over
>>>>> fmax    v0.4s, v0.4s, v1.4s
>>>>>
>>>>> but it changed nothing :-(
>>>>> Hence my question about a gdb possible bug or misuse.
>>>>
>>>>
>>>> Hmm... user error, I missed one bit
>>>> set $fpcr=0x2000000
>>>> works under gdb.
>>>>
>>>>> I'll try modifying the test to have it force DN=1.
>>>>>
>>>> Forcing DN=1 in the test makes it pass.
>>>>
>>>> I am going to look at adding that cleanly to my test, and resubmit it.
>>>>
>>>> Thanks, and sorry for the noise.
>>>>
>>> Here is the updated version:
>>> - Now I set DN=1 on AArch64 in clean_results, as it is the main
>>> initialization function.
>>> - I removed the double negative :-)
>>> - I removed the useless [u]int64 and poly variants
>>>
>>> Christophe.
>>>
>>> 2015-01-25  Christophe Lyon  <christophe.l...@linaro.org>
>>>
>>> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
>>> (_ARM_FPSRC): Add DN and AHP fields.
>>> (clean_results): Force DN=1 on AArch64.
>>> * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file.
>>> * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file.
>>> * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file.
>>> * gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file.
>>> * gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file.
>>> * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file.
>>>
>>
>> I guess you don't need the fake dependency fix for this as this is mostly
>> called only once?
>>
> Yes, that is my current assumption: for the time being there is no
> other code which can potentially change this value.
>
>> +  _ARM_FPSCR _afpscr_for_dn;
>> +  asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn));
>> +  _afpscr_for_dn.b.DN = 1;
>> +  asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn));
>
> Maybe in the future we'll want to check that DN=0 means that we
> actually forward a NaN != DefaultNaN, but that can be a further
> improvement to this patch.
>
Marcus,
Is it OK to commit this one?
This is the only remaining one from this series.

Thanks,

Christophe.

>> Otherwise, your patch looks OK to me(but I can't approve it).
> Thanks for the review.
>
>> Thanks,
>> Tejas.
>>
>>

Reply via email to