On 03/10/16 10:52, Wilco Dijkstra wrote:
Hi Evandro,
I have however encountered precision issues with DF, namely some benchmarks in
the SPECfp CPU2000 suite would fail to validate.
Accuracy is not an issue, the computation is extremely accurate. The issue is
that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero,
and that causes the miscompares you're seeing. So support for the zero case
should be added.
This would be a better expansion, supporting zero, and with lower latency than
the current sequence:
fcmp s0, 0.0
beq zero
frsqrte s1, s0
fmul s2, s1, s1
frsqrts s2, s0, s2
fmul s1, s1, s2
fmul s2, s1, s1
fmul s1, s0, s1
frsqrts s2, s0, s2
fmul s0, s1, s2
zero:
For the vector variant you can't avoid the extra latency of an AND, but it
should not be slower than it is today.
Thanks for the pointer, Wilco. Will work it in the patch.
--
Evandro Menezes