On 26 January 2015 at 14:23, Christophe Lyon <christophe.l...@linaro.org> wrote: > On 26 January 2015 at 13:10, Tejas Belagod <tejas.bela...@arm.com> wrote: >> On 25/01/15 21:05, Christophe Lyon wrote: >>> >>> On 23 January 2015 at 14:44, Christophe Lyon <christophe.l...@linaro.org> >>> wrote: >>>> >>>> On 23 January 2015 at 12:42, Christophe Lyon <christophe.l...@linaro.org> >>>> wrote: >>>>> >>>>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.bela...@arm.com> >>>>> wrote: >>>>>> >>>>>> On 22/01/15 21:31, Christophe Lyon wrote: >>>>>>> >>>>>>> >>>>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.bela...@arm.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 22/01/15 14:28, Christophe Lyon wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.bela...@arm.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft >>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon >>>>>>>>>>>> <christophe.l...@linaro.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft >>>>>>>>>>>>> <marcus.shawcr...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon >>>>>>>>>>>>>> <christophe.l...@linaro.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t >>>>>>>>>>>>>>>> regression >>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> vmin >>>>>>>>>>>>>>> when the input is -NaN. >>>>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the >>>>>>>>>>>>>>> result >>>>>>>>>>>>>>> should the same as on aarch32. >>>>>>>>>>>>>>> I haven't had time to look at it in more details though. >>>>>>>>>>>>>>> So, not OK? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did >>>>>>>>>>>>>> you >>>>>>>>>>>>>> test on HW or a model? >>>>>>>>>>>>>> >>>>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on >>>>>>>>>>>>> the >>>>>>>>>>>>> foundation model for aarch64*-elf. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Leave this one out until we understand why it fails. /Marcus >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I've looked at this a bit more. >>>>>>>>>>> We have >>>>>>>>>>> fmax v0.4s, v0.4s, v1.4s >>>>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1. >>>>>>>>>>> >>>>>>>>>>> The output is still -NaN (0xffc00000), while the test expects >>>>>>>>>>> defaultNaN (0x7fc00000). >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always >>>>>>>>>> uses >>>>>>>>>> the >>>>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In >>>>>>>>>> AArch64 >>>>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations >>>>>>>>>> depend >>>>>>>>>> on >>>>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or >>>>>>>>>> generate >>>>>>>>>> DefaultNaN depending on the value of DN. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I >>>>>>>>> can see only the latter (no diff between aarch32 and aarch64 in >>>>>>>>> FPProcessNan pseudo-code) >>>>>>>>> >>>>>>>> >>>>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec >>>>>>>> - >>>>>>>> under DN: >>>>>>>> >>>>>>>> "The value of this bit only controls scalar floating-point >>>>>>>> arithmetic. >>>>>>>> Advanced SIMD arithmetic always uses the Default NaN setting, >>>>>>>> regardless >>>>>>>> of >>>>>>>> the value of the DN bit." >>>>>>>> >>>>>>>> Also on page 3180 for the description of VMAX(vector FP), it says: >>>>>>>> " >>>>>>>> * max(+0.0, -0.0) = +0.0 >>>>>>>> * If any input is a NaN, the corresponding result element is the >>>>>>>> default >>>>>>>> NaN. >>>>>>>> " >>>>>>>> >>>>>>> Oops I was looking at FMAX (vector) pg 936. >>>>>>> >>>>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() >>>>>>>> to >>>>>>>> FPMax() which is on pg. 2285 >>>>>>>> >>>>>>>> // StandardFPSCRValue() >>>>>>>> // ==================== >>>>>>>> FPCRType StandardFPSCRValue() >>>>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’ >>>>>>>> >>>>>>>> Here bit-25(FPSCR.DN) is set to 1. >>>>>>>> >>>>>>> >>>>>>> So, we should get defaultNaN too on aarch64, and no need to try to >>>>>>> force DN to 1 in gdb? >>>>>>> >>>>>>> What can be wrong? >>>>>>> >>>>>> >>>>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're >>>>>> reading >>>>>> the same document. >>>>>> >>>>>> Regardless of the page number, if you see the pseudocode for >>>>>> VMAX(FPSIMD) >>>>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax() >>>>>> which >>>>>> means generate DefaultNaN() regardless. >>>>>> >>>>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the >>>>>> pseudocode gets just FPCR. >>>>>> >>>>>> >>>>> Ok, that was my initial understanding but our discussion confused me. >>>>> >>>>> And that's why I tried to force DN = 1 in gdb before single-stepping >>>>> over >>>>> fmax v0.4s, v0.4s, v1.4s >>>>> >>>>> but it changed nothing :-( >>>>> Hence my question about a gdb possible bug or misuse. >>>> >>>> >>>> Hmm... user error, I missed one bit >>>> set $fpcr=0x2000000 >>>> works under gdb. >>>> >>>>> I'll try modifying the test to have it force DN=1. >>>>> >>>> Forcing DN=1 in the test makes it pass. >>>> >>>> I am going to look at adding that cleanly to my test, and resubmit it. >>>> >>>> Thanks, and sorry for the noise. >>>> >>> Here is the updated version: >>> - Now I set DN=1 on AArch64 in clean_results, as it is the main >>> initialization function. >>> - I removed the double negative :-) >>> - I removed the useless [u]int64 and poly variants >>> >>> Christophe. >>> >>> 2015-01-25 Christophe Lyon <christophe.l...@linaro.org> >>> >>> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h >>> (_ARM_FPSRC): Add DN and AHP fields. >>> (clean_results): Force DN=1 on AArch64. >>> * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file. >>> >> >> I guess you don't need the fake dependency fix for this as this is mostly >> called only once? >> > Yes, that is my current assumption: for the time being there is no > other code which can potentially change this value. > >> + _ARM_FPSCR _afpscr_for_dn; >> + asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn)); >> + _afpscr_for_dn.b.DN = 1; >> + asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn)); > > Maybe in the future we'll want to check that DN=0 means that we > actually forward a NaN != DefaultNaN, but that can be a further > improvement to this patch. > Marcus, Is it OK to commit this one? This is the only remaining one from this series.
Thanks, Christophe. >> Otherwise, your patch looks OK to me(but I can't approve it). > Thanks for the review. > >> Thanks, >> Tejas. >> >>