----- Original Message ----- > On 13 August 2011 09:58, Kenneth Graunke <kenn...@whitecape.org> > wrote: > > On 08/12/2011 10:38 AM, Paul Berry wrote: > >> > >> This patch fixes a bug when lowering an integer division: > >> > >> x/y > >> > >> to a multiplication by a reciprocal: > >> > >> int(float(x)*reciprocal(float(y))) > >> > >> If x was a a plain int and y was an ivecN, the lowering pass > >> incorrectly assigned the type of the product to be float, when in > >> fact > >> it should be vecN. This caused mesa to abort with an IR > >> validation > >> error. > >> > >> Fixes piglit tests {fs,vs}-op-div-int-ivec{2,3,4}. > > > > Good catch, Paul! Thanks again for writing all these test cases. > > > > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > > > > Come to think of it, we may want to avoid this altogether on i965. > > The > > mathbox has an INT DIV message that computes integer quotient and > > remainder...so we can support it natively. > > > > I guess the question is "which is faster?". My intuition says that > > using > > INT DIV will be faster on Gen6+, possibly on Gen5, and slower on > > Gen4/G45. > > AFAICT on Gen5+ you can compute quotient & remainder separately (3 > > or 4 > > rounds) while on Gen4 you always have to compute both (3 + 4 = 7 > > rounds?). > > Meanwhile RCP is 2 rounds. Not only is that more rounds, it means > > hogging > > the shared mathbox for longer. > > Accuracy is also a question. Our current technique of multiplying by > the reciprocal doesn't work for some denominators because of rounding > errors in computing the reciprocal. For example, try to write a > piglit tests that computes 77/77. On Gen5 hardware, at least, this > produces zero. The reason is because rounding errors in computing > the > floating point reciprocal mean that 77*reciprocal(77) is actually > slightly less than 1.0, so it gets rounded down to zero when it's > converted back to an int. Note: I believe the smallest integer for > which rounding errors cause n*reciprocal(n) to be less than 1.0 is > n=25, which probably explains why we haven't noticed this bug before.
In places you don't have native int division support, you could use one Newton-Raphson iteration step for almost accurate results, assuggested accuracy of SSE2's RCPPS instructions. See for reference the following llvmpipe comment: /** * Do one Newton-Raphson step to improve reciprocate precision: * * x_{i+1} = x_i * (2 - a * x_i) * * XXX: Unfortunately this won't give IEEE-754 conformant results for 0 or * +/-Inf, giving NaN instead. Certain applications rely on this behavior, * such as Google Earth, which does RCP(RSQRT(0.0) when drawing the Earth's * halo. It would be necessary to clamp the argument to prevent this. * * See also: * - http://en.wikipedia.org/wiki/Division_(digital)#Newton.E2.80.93Raphson_division * - http://softwarecommunity.intel.com/articles/eng/1818.htm */ The softwarecommunity.intel.com link is down, but the "Intel® 64 and IA-32 Architectures Optimization Reference Manual" also documents this. As mentioned, the N-R iteration gives wrong results for the reciprocate of +/-inf, but that's guaranteed to never happen when the arguments are integers encoded as floats. Jose _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev