On 08/15/2011 09:48 AM, Paul Berry wrote:
On 15 August 2011 08:50, Jose Fonseca<jfons...@vmware.com> wrote:
In places you don't have native int division support, you could use one
Newton-Raphson iteration step for almost accurate results, assuggested accuracy
of SSE2's RCPPS instructions. See for reference the following llvmpipe comment:
/**
* Do one Newton-Raphson step to improve reciprocate precision:
*
* x_{i+1} = x_i * (2 - a * x_i)
*
* XXX: Unfortunately this won't give IEEE-754 conformant results for 0 or
* +/-Inf, giving NaN instead. Certain applications rely on this behavior,
* such as Google Earth, which does RCP(RSQRT(0.0) when drawing the Earth's
* halo. It would be necessary to clamp the argument to prevent this.
*
* See also:
* -
http://en.wikipedia.org/wiki/Division_(digital)#Newton.E2.80.93Raphson_division
* - http://softwarecommunity.intel.com/articles/eng/1818.htm
*/
The softwarecommunity.intel.com link is down, but the "IntelĀ® 64 and IA-32
Architectures Optimization Reference Manual" also documents this.
As mentioned, the N-R iteration gives wrong results for the reciprocate of
+/-inf, but that's guaranteed to never happen when the arguments are integers
encoded as floats.
Jose
Thanks for the reference, Jose. My understanding is that applying
Newton-Raphson to the reciprocal operation won't help directly in this
case (since the problem is due to the fundamental precision limit of
32-bit floats, and happens even if the reciprocal is computed
perfectly), but we may be able to cook up a variation on N-R that
works in this case.
Another idea I've been toying around with would be to implement the
equivalent of the following GLSL:
int divide(int x, int y)
{
int quotient = int(float(x)*reciprocal(float(y)));
if ((quotient+1)*y == x) {
// Fix the case where y divides x exactly, and rounding errors cause
// us to compute the wrong quotient.
quotient = quotient + 1;
}
return quotient;
}
(the actual routine would have to be slightly more complex than this
to make sure to do the right thing for negative values).
It's kind of an ugly hack, but it wouldn't cost too many GPU
instructions, and it would give us sufficient accuracy for pre-GLSL
1.30.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
You might also want to consider implementing
quotient = int((float(x) + 0.5 * float(y)) * reciprocal(float(y)));
This rounds the result to the nearest integer rather then flooring the
result and is arguably faster (assuming that common subexpressions for
float(y) are hoisted).
cheers, danm
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev