On Wed, Sep 24, 2014 at 4:10 AM, Ian Romanick <i...@freedesktop.org> wrote: > On 09/23/2014 03:39 PM, Erik Faye-Lund wrote: >> Our current atan()-approximation is pretty inaccurate at 1.0, so >> let's try to improve the situation by doing a direct approximation >> without going through atan. >> >> This new implementation uses an 11th degree polynomial to approximate >> atan in the [-1..1] range, and the following identitiy to reduce the >> entire range to [-1..1]: >> >> atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x) >> >> This range-reduction idea is taken from the paper "Fast computation >> of Arctangent Functions for Embedded Applications: A Comparative >> Analysis" (Ukil et al. 2011). >> >> The polynomial that approximates atan(x) is: >> >> x * 0.9999793128310355 - x^3 * 0.3326756418091246 + >> x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + >> x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 >> >> This polynomial was found with the following GNU Octave script: >> >> x = linspace(0, 1); >> y = atan(x); >> n = [1, 3, 5, 7, 9, 11]; >> format long; >> polyfitc(x, y, n) >> >> The polyfitc function is not built-in, but too long to include here. >> It can be downloaded from the following URL: >> >> http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m >> >> This fixes the following piglit test: >> shaders/glsl-const-folding-01 >> >> Signed-off-by: Erik Faye-Lund <kusmab...@gmail.com> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49713 >
Ah! Thanks a lot for that link, very useful. >> We could also go down that path, by introducing an ir_unop_atan that >> gets lowered to a polynomial before code-generation. That would benefit >> drivers for hardware with atan-support (at least Mali supports this, >> according to http://limadriver.org/Lima+ISA/), but it worries me a bit to >> do constant folding with different precision than execution. But perhaps >> we already have that problem, only a bit more subtle? > > We used to implement constant folding for many built-in functions this > way. Built-in functions like atan were detected in the constant folding > process, and C library functions were used. Commit 363c14ae0 changed > this to simplify the constant folding code. This test case began > failing at that time, and Vinson submitted bug #49713. Aha, thanks for clearing that up. However, just to be clear, my alternative suggestion isn't to go back in that direction. It's to make atan a middle-class citizen, like ir_unop_[exp|log]. These always gets lowered to ir_unop_[exp|log]2, but in this case the lowering would go to arithmetic. Something like this (also no piglit regressions): --8<-- diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 9be7f6d..cc6c8b3 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -442,6 +442,7 @@ private: ir_swizzle *matrix_elt(ir_variable *var, int col, int row); ir_expression *asin_expr(ir_variable *x); + void do_atan(ir_factory &body, const glsl_type *type, ir_variable *res, operand y_over_x); /** * Call function \param f with parameters specified as the linked @@ -2596,6 +2597,7 @@ builtin_builder::_degrees(const glsl_type *type) UNOP(sin, ir_unop_sin, always_available) UNOP(cos, ir_unop_cos, always_available) +UNOP(atan, ir_unop_atan, always_available) ir_function_signature * builtin_builder::_tan(const glsl_type *type) @@ -2684,11 +2686,7 @@ builtin_builder::_atan2(const glsl_type *type) ir_factory outer_then(&outer_if->then_instructions, mem_ctx); /* Then...call atan(y/x) */ - ir_variable *y_over_x = outer_then.make_temp(glsl_type::float_type, "y_over_x"); - outer_then.emit(assign(y_over_x, div(y, x))); - outer_then.emit(assign(r, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x), - imm(1.0f)))))); - outer_then.emit(assign(r, asin_expr(r))); + body.emit(assign(r, expr(ir_unop_atan, div(y, x)))); /* ...and fix it up: */ ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f))); @@ -2712,21 +2710,6 @@ builtin_builder::_atan2(const glsl_type *type) } ir_function_signature * -builtin_builder::_atan(const glsl_type *type) -{ - ir_variable *y_over_x = in_var(type, "y_over_x"); - MAKE_SIG(type, always_available, 1, y_over_x); - - ir_variable *t = body.make_temp(type, "t"); - body.emit(assign(t, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x), - imm(1.0f)))))); - - body.emit(ret(asin_expr(t))); - - return sig; -} - -ir_function_signature * builtin_builder::_sinh(const glsl_type *type) { ir_variable *x = in_var(type, "x"); diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 739a9f4..1ec7789 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -247,6 +247,7 @@ ir_expression::ir_expression(int op, ir_rvalue *op0) case ir_unop_cos: case ir_unop_sin_reduced: case ir_unop_cos_reduced: + case ir_unop_atan: case ir_unop_dFdx: case ir_unop_dFdx_coarse: case ir_unop_dFdx_fine: @@ -513,6 +514,7 @@ static const char *const operator_strs[] = { "cos", "sin_reduced", "cos_reduced", + "atan", "dFdx", "dFdxCoarse", "dFdxFine", diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 8003f88..c715203 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -1200,6 +1200,7 @@ enum ir_expression_operation { ir_unop_cos, ir_unop_sin_reduced, /**< Reduced range sin. [-pi, pi] */ ir_unop_cos_reduced, /**< Reduced range cos. [-pi, pi] */ + ir_unop_atan, /*@}*/ /** diff --git a/src/glsl/ir_constant_expression.cpp b/src/glsl/ir_constant_expression.cpp index 1e8b3a3..ca3d4d7 100644 --- a/src/glsl/ir_constant_expression.cpp +++ b/src/glsl/ir_constant_expression.cpp @@ -730,6 +730,12 @@ ir_expression::constant_expression_value(struct hash_table *variable_context) } break; + case ir_unop_atan: + for (unsigned c = 0; c < op[0]->type->components(); c++) { + data.f[c] = atanf(op[0]->value.f[c]); + } + break; + case ir_unop_neg: for (unsigned c = 0; c < op[0]->type->components(); c++) { switch (this->type->base_type) { diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 97a581d..9cd04d4 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -317,6 +317,7 @@ ir_validate::visit_leave(ir_expression *ir) case ir_unop_cos: case ir_unop_sin_reduced: case ir_unop_cos_reduced: + case ir_unop_atan: case ir_unop_dFdx: case ir_unop_dFdx_coarse: case ir_unop_dFdx_fine: diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp index 6842853..4879424 100644 --- a/src/glsl/lower_instructions.cpp +++ b/src/glsl/lower_instructions.cpp @@ -112,6 +112,7 @@ */ #include "main/core.h" /* for M_LOG2E */ +#include "program/prog_instruction.h" /* for SWIZZLE_XYZW */ #include "glsl_types.h" #include "ir.h" #include "ir_builder.h" @@ -145,6 +146,7 @@ private: void carry_to_arith(ir_expression *); void borrow_to_arith(ir_expression *); void sat_to_clamp(ir_expression *); + void atan_to_arith(ir_expression *); }; } /* anonymous namespace */ @@ -508,6 +510,75 @@ lower_instructions_visitor::sat_to_clamp(ir_expression *ir) this->progress = true; } +void +lower_instructions_visitor::atan_to_arith(ir_expression *ir) +{ + ir_variable *y_over_x = new(ir) ir_variable(ir->type, "y_over_x", ir_var_temporary); + ir_variable *x = new(ir) ir_variable(ir->type, "x", ir_var_temporary); + ir_variable *xx = new(ir) ir_variable(ir->type, "xx", ir_var_temporary); + ir_variable *tmp = new(ir) ir_variable(ir->type, "tmp", ir_var_temporary); + + ir_instruction &i = *base_ir; + + /* Copy <y_over_x> argument */ + i.insert_before(y_over_x); + i.insert_before(assign(y_over_x, ir->operands[0])); + + /* + * range-reduction, first step: + * + * / y_over_x if |y_over_x| <= 1.0; + * x = < + * \ 1.0 / y_over_x otherwise + */ + i.insert_before(x); + i.insert_before(assign(x, div(min2(abs(y_over_x), + new(ir) ir_constant(1.0f)), + max2(abs(y_over_x), + new(ir) ir_constant(1.0f))))); + + /* + * approximate atan by evaluating polynomial: + * + * x * 0.9999793128310355 - x^3 * 0.3326756418091246 + + * x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 + + * x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444 + */ + i.insert_before(xx); + i.insert_before(assign(xx, mul(x, x))); + ir_constant *term1 = new(ir) ir_constant(+0.9999793128310355f); + ir_constant *term3 = new(ir) ir_constant(-0.3326756418091246f); + ir_constant *term5 = new(ir) ir_constant(+0.1938924977115610f); + ir_constant *term7 = new(ir) ir_constant(-0.1173503194786851f); + ir_constant *term9 = new(ir) ir_constant(+0.0536813784310406f); + ir_constant *term11 = new(ir) ir_constant(-0.0121323213173444f); + i.insert_before(tmp); + i.insert_before(assign(tmp, mul(add(mul(add(mul(add(mul(add(mul(add(mul(term11, + xx), + term9), + xx), + term7), + xx), + term5), + xx), + term3), + xx), + term1), + x))); + + /* range-reduction and sign fixup */ + ir->operation = ir_binop_mul; + ir->operands[0] = add(tmp, + csel(greater(abs(y_over_x), + new(ir) ir_constant(1.0f, ir->type->components())), + add(mul(tmp, + new(ir) ir_constant(-2.0f)), + new(ir) ir_constant((float) M_PI_2)), + new(ir) ir_constant(0.0f, ir->type->components()))); + ir->operands[1] = sign(y_over_x); + this->progress = true; +} + ir_visitor_status lower_instructions_visitor::visit_leave(ir_expression *ir) { @@ -569,6 +640,10 @@ lower_instructions_visitor::visit_leave(ir_expression *ir) sat_to_clamp(ir); break; + case ir_unop_atan: + // TODO: make a flag for this also + atan_to_arith(ir); + break; + default: return visit_continue; } diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp index cb0a079..5b56f9a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp @@ -431,6 +431,7 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment *ir) case ir_unop_unpack_unorm_2x16: case ir_unop_unpack_unorm_4x8: case ir_unop_unpack_half_2x16: + case ir_unop_atan: case ir_binop_ldexp: case ir_binop_vector_extract: case ir_triop_vector_insert: diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 2d5318a..2a8a6fb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -585,6 +586,8 @@ fs_visitor::visit(ir_expression *ir) case ir_unop_cos_reduced: emit_math(SHADER_OPCODE_COS, this->result, op[0]); break; + case ir_unop_atan: + unreachable("not reached: should be handled by atan_to_arith"); case ir_unop_dFdx: emit(FS_OPCODE_DDX, this->result, op[0], fs_reg(BRW_DERIVATIVE_BY_HINT)); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 473b3c5..8ed09b8 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1365,6 +1366,8 @@ vec4_visitor::visit(ir_expression *ir) case ir_unop_cos_reduced: emit_math(SHADER_OPCODE_COS, result_dst, op[0]); break; + case ir_unop_atan: + unreachable("not reached: should be handled by atan_to_arith"); case ir_unop_dFdx: case ir_unop_dFdx_coarse: diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp index 83cd1ae..991127d 100644 --- a/src/mesa/program/ir_to_mesa.cpp +++ b/src/mesa/program/ir_to_mesa.cpp @@ -1115,6 +1117,9 @@ ir_to_mesa_visitor::visit(ir_expression *ir) case ir_unop_cos_reduced: emit_scs(ir, OPCODE_COS, result_dst, op[0]); break; + case ir_unop_atan: + assert(!"not reached: should be handled by atan_to_arith"); + break; case ir_unop_dFdx: emit(ir, OPCODE_DDX, result_dst, op[0]); diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 7c2f0fc..ce3744b 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -1401,6 +1403,9 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir) case ir_unop_cos_reduced: emit_scs(ir, TGSI_OPCODE_COS, result_dst, op[0]); break; + case ir_unop_atan: + assert(!"not reached: should be handled by atan_to_arith"); + break; case ir_unop_saturate: { glsl_to_tgsi_instruction *inst; inst = emit(ir, TGSI_OPCODE_MOV, result_dst, op[0]); -->8-- Note that this version also uses the new polynomial, rather than the old approximation. But we get even better constant folding. Dunno if it's worth it -- but it does get rid of that do_atan()-ugliness ;) > I'm not worried about any potential performance impact because I've > never seen any application use atan. Of course, that's also why that > bug has sat open for long. :) Yeah, atan is not very common, and when it's needed people seem to tend to use 1d textures instead. So I think accuracy trumps performance in this case. > Reviewed-by: Ian Romanick <ian.d.roman...@intel.com> > >> + /* range-reduction fixup */ >> + body.emit(assign(tmp, add(tmp, >> + csel(greater(abs(y_over_x), >> + swizzle(imm(1.0f), >> + SWIZZLE_XXXX, >> + type->components())), I guess this should work just as well (sorry for my lack of ir_builder-fu): - swizzle(imm(1.0f), - SWIZZLE_XXXX, - type->components())), + imm(1.0f, type->components())), >> + add(mul(tmp, >> + imm(-2.0f)), >> + imm(M_PI_2f)), >> + imm(0.0f))))); Hm. Don't I need to expand this last immediate to a vector of type->components() size as well? If so, this patch should go on top: diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 1820dd5..bfa46eb 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -2749,13 +2749,11 @@ builtin_builder::do_atan(ir_factory &body, const glsl_type *type, ir_variable *r /* range-reduction fixup */ body.emit(assign(tmp, add(tmp, csel(greater(abs(y_over_x), - swizzle(imm(1.0f), - SWIZZLE_XXXX, - type->components())), + imm(1.0f, type->components())), add(mul(tmp, imm(-2.0f)), imm(M_PI_2f)), - imm(0.0f))))); + imm(0.0f, type->components()))))); /* sign fixup */ body.emit(assign(res, mul(tmp, sign(y_over_x)))); _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev