On 09/23/2014 03:39 PM, Erik Faye-Lund wrote: > Our current atan()-approximation is pretty inaccurate at 1.0, so > let's try to improve the situation by doing a direct approximation > without going through atan. > > This new implementation uses an 11th degree polynomial to approximate > atan in the [-1..1] range, and the following identitiy to reduce the > entire range to [-1..1]: > > atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x) > > This range-reduction idea is taken from the paper "Fast computation > of Arctangent Functions for Embedded Applications: A Comparative > Analysis" (Ukil et al. 2011). > > The polynomial that approximates atan(x) is: > > x * 0.9999793128310355 - x^3 * 0.3326756418091246 + > x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + > x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 > > This polynomial was found with the following GNU Octave script: > > x = linspace(0, 1); > y = atan(x); > n = [1, 3, 5, 7, 9, 11]; > format long; > polyfitc(x, y, n) > > The polyfitc function is not built-in, but too long to include here. > It can be downloaded from the following URL: > > http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m > > This fixes the following piglit test: > shaders/glsl-const-folding-01 > > Signed-off-by: Erik Faye-Lund <kusmab...@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49713 See below... > --- > I noticed that glsl-const-folding-01 failed due to a pretty inaccurate > result for atan(1.0) * 4.0. > > The result should be pi, but we only produce 3.141298. The new > approximation gets us to 3.141580, which is at least better than before. > > I also tried using one less term (this is what Microsoft's HLSL compiler > does) as well as a bunch of other simpler approximations, but that didn't > get us far enough to pass the test. > > Depressingly enough, it seems NVIDIA's proprietary drivers uses the same > polynomial as the HLSL compiler, but passes the test because it uses an > even more accurate atan implementation during constant-folding. > > We could also go down that path, by introducing an ir_unop_atan that > gets lowered to a polynomial before code-generation. That would benefit > drivers for hardware with atan-support (at least Mali supports this, > according to http://limadriver.org/Lima+ISA/), but it worries me a bit to > do constant folding with different precision than execution. But perhaps > we already have that problem, only a bit more subtle? We used to implement constant folding for many built-in functions this way. Built-in functions like atan were detected in the constant folding process, and C library functions were used. Commit 363c14ae0 changed this to simplify the constant folding code. This test case began failing at that time, and Vinson submitted bug #49713. I'm not worried about any potential performance impact because I've never seen any application use atan. Of course, that's also why that bug has sat open for long. :) Reviewed-by: Ian Romanick <ian.d.roman...@intel.com> > src/glsl/builtin_functions.cpp | 68 > +++++++++++++++++++++++++++++++++++------- > 1 file changed, 58 insertions(+), 10 deletions(-) > > diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp > index 9be7f6d..1820dd5 100644 > --- a/src/glsl/builtin_functions.cpp > +++ b/src/glsl/builtin_functions.cpp > @@ -442,6 +442,7 @@ private: > ir_swizzle *matrix_elt(ir_variable *var, int col, int row); > > ir_expression *asin_expr(ir_variable *x); > + void do_atan(ir_factory &body, const glsl_type *type, ir_variable *res, > operand y_over_x); > > /** > * Call function \param f with parameters specified as the linked > @@ -2684,11 +2685,7 @@ builtin_builder::_atan2(const glsl_type *type) > ir_factory outer_then(&outer_if->then_instructions, mem_ctx); > > /* Then...call atan(y/x) */ > - ir_variable *y_over_x = outer_then.make_temp(glsl_type::float_type, > "y_over_x"); > - outer_then.emit(assign(y_over_x, div(y, x))); > - outer_then.emit(assign(r, mul(y_over_x, rsq(add(mul(y_over_x, > y_over_x), > - imm(1.0f)))))); > - outer_then.emit(assign(r, asin_expr(r))); > + do_atan(body, glsl_type::float_type, r, div(y, x)); > > /* ...and fix it up: */ > ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f))); > @@ -2711,17 +2708,68 @@ builtin_builder::_atan2(const glsl_type *type) > return sig; > } > > +void > +builtin_builder::do_atan(ir_factory &body, const glsl_type *type, > ir_variable *res, operand y_over_x) > +{ > + /* > + * range-reduction, first step: > + * > + * / y_over_x if |y_over_x| <= 1.0; > + * x = < > + * \ 1.0 / y_over_x otherwise > + */ > + ir_variable *x = body.make_temp(type, "atan_x"); > + body.emit(assign(x, div(min2(abs(y_over_x), > + imm(1.0f)), > + max2(abs(y_over_x), > + imm(1.0f))))); > + > + /* > + * approximate atan by evaluating polynomial: > + * > + * x * 0.9999793128310355 - x^3 * 0.3326756418091246 + > + * x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + > + * x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 > + */ > + ir_variable *tmp = body.make_temp(type, "atan_tmp"); > + body.emit(assign(tmp, mul(x, x))); > + body.emit(assign(tmp, > mul(add(mul(sub(mul(add(mul(sub(mul(add(mul(imm(-0.0121323213173444f), > + tmp), > + > imm(0.0536813784310406f)), > + tmp), > + > imm(0.1173503194786851f)), > + tmp), > + imm(0.1938924977115610f)), > + tmp), > + imm(0.3326756418091246f)), > + tmp), > + imm(0.9999793128310355f)), > + x))); > + > + /* range-reduction fixup */ > + body.emit(assign(tmp, add(tmp, > + csel(greater(abs(y_over_x), > + swizzle(imm(1.0f), > + SWIZZLE_XXXX, > + type->components())), > + add(mul(tmp, > + imm(-2.0f)), > + imm(M_PI_2f)), > + imm(0.0f))))); > + > + /* sign fixup */ > + body.emit(assign(res, mul(tmp, sign(y_over_x)))); > +} > + > ir_function_signature * > builtin_builder::_atan(const glsl_type *type) > { > ir_variable *y_over_x = in_var(type, "y_over_x"); > MAKE_SIG(type, always_available, 1, y_over_x); > > - ir_variable *t = body.make_temp(type, "t"); > - body.emit(assign(t, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x), > - imm(1.0f)))))); > - > - body.emit(ret(asin_expr(t))); > + ir_variable *tmp = body.make_temp(type, "tmp"); > + do_atan(body, type, tmp, y_over_x); > + body.emit(ret(tmp)); > > return sig; > } > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev