Re: [AArch64] Emit square root using the Newton series

Evandro Menezes Thu, 10 Mar 2016 08:59:32 -0800

On 03/10/16 10:52, Wilco Dijkstra wrote:

Hi Evandro,

I have however encountered precision issues with DF, namely some benchmarks in 
the SPECfp CPU2000 suite would fail to validate.

Accuracy is not an issue, the computation is extremely accurate. The issue is 
that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, 
and that causes the miscompares you're seeing. So support for the zero case 
should be added.

This would be a better expansion, supporting zero, and with lower latency than 
the current sequence:

     fcmp    s0, 0.0
     beq      zero
     frsqrte    s1, s0
     fmul    s2, s1, s1
     frsqrts    s2, s0, s2
     fmul    s1, s1, s2
     fmul    s2, s1, s1
     fmul   s1, s0, s1
     frsqrts    s2, s0, s2
     fmul    s0, s1, s2
zero:

For the vector variant you can't avoid the extra latency of an AND, but it 
should not be slower than it is today.


Thanks for the pointer, Wilco.  Will work it in the patch.

--
Evandro Menezes

Re: [AArch64] Emit square root using the Newton series

Reply via email to