On 04/01/16 16:22, Wilco Dijkstra wrote:
Evandro Menezes wrote:
The division variant should use the same latency reduction trick I mentioned 
for sqrt.
I don't think that it applies here, since it doesn't have to deal with
special cases.
No it applies as it's exactly the same calculation: x * rsqrt(y) and x * 
recip(y). In both
cases you don't need the final result of rsqrt(y) or recip(y), avoiding a 
multiply.
Given these sequences are high latency this saving is actually quite important.

Wilco,

In the case of sqrt(), the special case when the argument is 0.0 multiplication is necessary in order to guarantee correctness. Handling this special case hurts performance, when your suggestion helps.

However, I don't think that there's the need to handle any special case for division. The only case when the approximation differs from division is when the numerator is infinity and the denominator, zero, when the approximation returns infinity and the division, NAN. So I don't think that it's a special case that deserves being handled. IOW, the result of the approximate reciprocal is always needed.

Or am I missing something?

Thank you,

--
Evandro Menezes

Reply via email to