Re: [Mesa-dev] [PATCH] glsl: improve accuracy of atan()

Ian Romanick Tue, 23 Sep 2014 19:11:19 -0700

On 09/23/2014 03:39 PM, Erik Faye-Lund wrote:
> Our current atan()-approximation is pretty inaccurate at 1.0, so
> let's try to improve the situation by doing a direct approximation
> without going through atan.
> 
> This new implementation uses an 11th degree polynomial to approximate
> atan in the [-1..1] range, and the following identitiy to reduce the
> entire range to [-1..1]:
> 
> atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)
> 
> This range-reduction idea is taken from the paper "Fast computation
> of Arctangent Functions for Embedded Applications: A Comparative
> Analysis" (Ukil et al. 2011).
> 
> The polynomial that approximates atan(x) is:
> 
> x   * 0.9999793128310355 - x^3  * 0.3326756418091246 +
> x^5 * 0.1938924977115610 - x^7  * 0.1173503194786851 +
> x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
> 
> This polynomial was found with the following GNU Octave script:
> 
> x = linspace(0, 1);
> y = atan(x);
> n = [1, 3, 5, 7, 9, 11];
> format long;
> polyfitc(x, y, n)
> 
> The polyfitc function is not built-in, but too long to include here.
> It can be downloaded from the following URL:
> 
> http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m
> 
> This fixes the following piglit test:
> shaders/glsl-const-folding-01
> 
> Signed-off-by: Erik Faye-Lund <kusmab...@gmail.com>


Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49713

See below...

> ---
> I noticed that glsl-const-folding-01 failed due to a pretty inaccurate
> result for atan(1.0) * 4.0.
> 
> The result should be pi, but we only produce 3.141298. The new
> approximation gets us to 3.141580, which is at least better than before.
> 
> I also tried using one less term (this is what Microsoft's HLSL compiler
> does) as well as a bunch of other simpler approximations, but that didn't
> get us far enough to pass the test.
> 
> Depressingly enough, it seems NVIDIA's proprietary drivers uses the same
> polynomial as the HLSL compiler, but passes the test because it uses an
> even more accurate atan implementation during constant-folding.
> 
> We could also go down that path, by introducing an ir_unop_atan that
> gets lowered to a polynomial before code-generation. That would benefit
> drivers for hardware with atan-support (at least Mali supports this,
> according to http://limadriver.org/Lima+ISA/), but it worries me a bit to
> do constant folding with different precision than execution. But perhaps
> we already have that problem, only a bit more subtle?

We used to implement constant folding for many built-in functions this
way.  Built-in functions like atan were detected in the constant folding
process, and C library functions were used.  Commit 363c14ae0 changed
this to simplify the constant folding code.  This test case began
failing at that time, and Vinson submitted bug #49713.

I'm not worried about any potential performance impact because I've
never seen any application use atan.   Of course, that's also why that
bug has sat open for long. :)

Reviewed-by: Ian Romanick <ian.d.roman...@intel.com>

>  src/glsl/builtin_functions.cpp | 68 
> +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
> index 9be7f6d..1820dd5 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -442,6 +442,7 @@ private:
>     ir_swizzle *matrix_elt(ir_variable *var, int col, int row);
>  
>     ir_expression *asin_expr(ir_variable *x);
> +   void do_atan(ir_factory &body, const glsl_type *type, ir_variable *res, 
> operand y_over_x);
>  
>     /**
>      * Call function \param f with parameters specified as the linked
> @@ -2684,11 +2685,7 @@ builtin_builder::_atan2(const glsl_type *type)
>        ir_factory outer_then(&outer_if->then_instructions, mem_ctx);
>  
>        /* Then...call atan(y/x) */
> -      ir_variable *y_over_x = outer_then.make_temp(glsl_type::float_type, 
> "y_over_x");
> -      outer_then.emit(assign(y_over_x, div(y, x)));
> -      outer_then.emit(assign(r, mul(y_over_x, rsq(add(mul(y_over_x, 
> y_over_x),
> -                                                      imm(1.0f))))));
> -      outer_then.emit(assign(r, asin_expr(r)));
> +      do_atan(body, glsl_type::float_type, r, div(y, x));
>  
>        /*     ...and fix it up: */
>        ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f)));
> @@ -2711,17 +2708,68 @@ builtin_builder::_atan2(const glsl_type *type)
>     return sig;
>  }
>  
> +void
> +builtin_builder::do_atan(ir_factory &body, const glsl_type *type, 
> ir_variable *res, operand y_over_x)
> +{
> +   /*
> +    * range-reduction, first step:
> +    *
> +    *      / y_over_x         if |y_over_x| <= 1.0;
> +    * x = <
> +    *      \ 1.0 / y_over_x   otherwise
> +    */
> +   ir_variable *x = body.make_temp(type, "atan_x");
> +   body.emit(assign(x, div(min2(abs(y_over_x),
> +                                imm(1.0f)),
> +                           max2(abs(y_over_x),
> +                                imm(1.0f)))));
> +
> +   /*
> +    * approximate atan by evaluating polynomial:
> +    *
> +    * x   * 0.9999793128310355 - x^3  * 0.3326756418091246 +
> +    * x^5 * 0.1938924977115610 - x^7  * 0.1173503194786851 +
> +    * x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
> +    */
> +   ir_variable *tmp = body.make_temp(type, "atan_tmp");
> +   body.emit(assign(tmp, mul(x, x)));
> +   body.emit(assign(tmp, 
> mul(add(mul(sub(mul(add(mul(sub(mul(add(mul(imm(-0.0121323213173444f),
> +                                                                     tmp),
> +                                                                 
> imm(0.0536813784310406f)),
> +                                                             tmp),
> +                                                         
> imm(0.1173503194786851f)),
> +                                                     tmp),
> +                                                 imm(0.1938924977115610f)),
> +                                             tmp),
> +                                         imm(0.3326756418091246f)),
> +                                     tmp),
> +                                 imm(0.9999793128310355f)),
> +                             x)));
> +
> +   /* range-reduction fixup */
> +   body.emit(assign(tmp, add(tmp,
> +                             csel(greater(abs(y_over_x),
> +                                          swizzle(imm(1.0f),
> +                                                  SWIZZLE_XXXX,
> +                                                  type->components())),
> +                                  add(mul(tmp,
> +                                          imm(-2.0f)),
> +                                      imm(M_PI_2f)),
> +                                  imm(0.0f)))));
> +
> +   /* sign fixup */
> +   body.emit(assign(res, mul(tmp, sign(y_over_x))));
> +}
> +
>  ir_function_signature *
>  builtin_builder::_atan(const glsl_type *type)
>  {
>     ir_variable *y_over_x = in_var(type, "y_over_x");
>     MAKE_SIG(type, always_available, 1, y_over_x);
>  
> -   ir_variable *t = body.make_temp(type, "t");
> -   body.emit(assign(t, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x),
> -                                             imm(1.0f))))));
> -
> -   body.emit(ret(asin_expr(t)));
> +   ir_variable *tmp = body.make_temp(type, "tmp");
> +   do_atan(body, type, tmp, y_over_x);
> +   body.emit(ret(tmp));
>  
>     return sig;
>  }
> 

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: improve accuracy of atan()

Reply via email to