Evandro Menezes wrote:
> > The division variant should use the same latency reduction trick I 
> > mentioned for sqrt.
>
> I don't think that it applies here, since it doesn't have to deal with
> special cases.

No it applies as it's exactly the same calculation: x * rsqrt(y) and x * 
recip(y). In both
cases you don't need the final result of rsqrt(y) or recip(y), avoiding a 
multiply. 
Given these sequences are high latency this saving is actually quite important.

Wilco

Reply via email to