On 26 January 2015 at 13:10, Tejas Belagod <tejas.bela...@arm.com> wrote: > On 25/01/15 21:05, Christophe Lyon wrote: >> >> On 23 January 2015 at 14:44, Christophe Lyon <christophe.l...@linaro.org> >> wrote: >>> >>> On 23 January 2015 at 12:42, Christophe Lyon <christophe.l...@linaro.org> >>> wrote: >>>> >>>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.bela...@arm.com> >>>> wrote: >>>>> >>>>> On 22/01/15 21:31, Christophe Lyon wrote: >>>>>> >>>>>> >>>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.bela...@arm.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 22/01/15 14:28, Christophe Lyon wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.bela...@arm.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft >>>>>>>>>> <marcus.shawcr...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon >>>>>>>>>>> <christophe.l...@linaro.org> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft >>>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon >>>>>>>>>>>>> <christophe.l...@linaro.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t >>>>>>>>>>>>>>> regression >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax >>>>>>>>>>>>>> and >>>>>>>>>>>>>> vmin >>>>>>>>>>>>>> when the input is -NaN. >>>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the >>>>>>>>>>>>>> result >>>>>>>>>>>>>> should the same as on aarch32. >>>>>>>>>>>>>> I haven't had time to look at it in more details though. >>>>>>>>>>>>>> So, not OK? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did >>>>>>>>>>>>> you >>>>>>>>>>>>> test on HW or a model? >>>>>>>>>>>>> >>>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on >>>>>>>>>>>> the >>>>>>>>>>>> foundation model for aarch64*-elf. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Leave this one out until we understand why it fails. /Marcus >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I've looked at this a bit more. >>>>>>>>>> We have >>>>>>>>>> fmax v0.4s, v0.4s, v1.4s >>>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1. >>>>>>>>>> >>>>>>>>>> The output is still -NaN (0xffc00000), while the test expects >>>>>>>>>> defaultNaN (0x7fc00000). >>>>>>>>>> >>>>>>>>> >>>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always >>>>>>>>> uses >>>>>>>>> the >>>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In >>>>>>>>> AArch64 >>>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations >>>>>>>>> depend >>>>>>>>> on >>>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or >>>>>>>>> generate >>>>>>>>> DefaultNaN depending on the value of DN. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I >>>>>>>> can see only the latter (no diff between aarch32 and aarch64 in >>>>>>>> FPProcessNan pseudo-code) >>>>>>>> >>>>>>> >>>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec >>>>>>> - >>>>>>> under DN: >>>>>>> >>>>>>> "The value of this bit only controls scalar floating-point >>>>>>> arithmetic. >>>>>>> Advanced SIMD arithmetic always uses the Default NaN setting, >>>>>>> regardless >>>>>>> of >>>>>>> the value of the DN bit." >>>>>>> >>>>>>> Also on page 3180 for the description of VMAX(vector FP), it says: >>>>>>> " >>>>>>> * max(+0.0, -0.0) = +0.0 >>>>>>> * If any input is a NaN, the corresponding result element is the >>>>>>> default >>>>>>> NaN. >>>>>>> " >>>>>>> >>>>>> Oops I was looking at FMAX (vector) pg 936. >>>>>> >>>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() >>>>>>> to >>>>>>> FPMax() which is on pg. 2285 >>>>>>> >>>>>>> // StandardFPSCRValue() >>>>>>> // ==================== >>>>>>> FPCRType StandardFPSCRValue() >>>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’ >>>>>>> >>>>>>> Here bit-25(FPSCR.DN) is set to 1. >>>>>>> >>>>>> >>>>>> So, we should get defaultNaN too on aarch64, and no need to try to >>>>>> force DN to 1 in gdb? >>>>>> >>>>>> What can be wrong? >>>>>> >>>>> >>>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're >>>>> reading >>>>> the same document. >>>>> >>>>> Regardless of the page number, if you see the pseudocode for >>>>> VMAX(FPSIMD) >>>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax() >>>>> which >>>>> means generate DefaultNaN() regardless. >>>>> >>>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the >>>>> pseudocode gets just FPCR. >>>>> >>>>> >>>> Ok, that was my initial understanding but our discussion confused me. >>>> >>>> And that's why I tried to force DN = 1 in gdb before single-stepping >>>> over >>>> fmax v0.4s, v0.4s, v1.4s >>>> >>>> but it changed nothing :-( >>>> Hence my question about a gdb possible bug or misuse. >>> >>> >>> Hmm... user error, I missed one bit >>> set $fpcr=0x2000000 >>> works under gdb. >>> >>>> I'll try modifying the test to have it force DN=1. >>>> >>> Forcing DN=1 in the test makes it pass. >>> >>> I am going to look at adding that cleanly to my test, and resubmit it. >>> >>> Thanks, and sorry for the noise. >>> >> Here is the updated version: >> - Now I set DN=1 on AArch64 in clean_results, as it is the main >> initialization function. >> - I removed the double negative :-) >> - I removed the useless [u]int64 and poly variants >> >> Christophe. >> >> 2015-01-25 Christophe Lyon <christophe.l...@linaro.org> >> >> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h >> (_ARM_FPSRC): Add DN and AHP fields. >> (clean_results): Force DN=1 on AArch64. >> * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file. >> > > I guess you don't need the fake dependency fix for this as this is mostly > called only once? > Yes, that is my current assumption: for the time being there is no other code which can potentially change this value.
> + _ARM_FPSCR _afpscr_for_dn; > + asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn)); > + _afpscr_for_dn.b.DN = 1; > + asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn)); Maybe in the future we'll want to check that DN=0 means that we actually forward a NaN != DefaultNaN, but that can be a further improvement to this patch. > Otherwise, your patch looks OK to me(but I can't approve it). Thanks for the review. > Thanks, > Tejas. > >