From: Ian Romanick <ian.d.roman...@intel.com> If one of the multiplicands is an ir_triop_csel and the the possible results of that csel are limited to {-1, 0, 1}, we can distribute the multiply over the csel to eliminate the multiply:
x * mix( 0, 1, condition) => mix( 0, x, condition) x * mix( 0, -1, condition) => mix( 0, -x, condition) x * mix( 1, 0, condition) => mix( x, 0, condition) x * mix( 1, -1, condition) => mix( x, -x, condition) x * mix(-1, 0, condition) => mix(-x, 0, condition) x * mix(-1, 1, condition) => mix(-x, x, condition) This assumes that negation is free. I have not yet investigate why this hurts more with NIR. v2: Use swizzle_if_required on operands. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3545604 -> 3545720 (0.00%) instructions in affected programs: 94890 -> 95006 (0.12%) helped: 83 HURT: 286 Iron Lake (0x0046): total instructions in shared programs: 4975867 -> 4976076 (0.00%) instructions in affected programs: 98079 -> 98288 (0.21%) helped: 86 HURT: 389 Sandy Bridge (0x0116): total instructions in shared programs: 6803299 -> 6802216 (-0.02%) instructions in affected programs: 299775 -> 298692 (-0.36%) helped: 1325 HURT: 233 GAINED: 3 Sandy Bridge (0x0116) NIR: total instructions in shared programs: 6811661 -> 6817191 (0.08%) instructions in affected programs: 422203 -> 427733 (1.31%) helped: 182 HURT: 1939 Ivy Bridge (0x0166): total instructions in shared programs: 6279602 -> 6278560 (-0.02%) instructions in affected programs: 303149 -> 302107 (-0.34%) helped: 1310 HURT: 283 Ivy Bridge (0x0166) NIR: total instructions in shared programs: 6319127 -> 6324626 (0.09%) instructions in affected programs: 401418 -> 406917 (1.37%) helped: 182 HURT: 1929 Haswell (0x0426): total instructions in shared programs: 5764382 -> 5764021 (-0.01%) instructions in affected programs: 270160 -> 269799 (-0.13%) helped: 1083 HURT: 510 Haswell (0x0426) NIR: total instructions in shared programs: 5794178 -> 5800358 (0.11%) instructions in affected programs: 359490 -> 365670 (1.72%) helped: 182 HURT: 1929 Broadwell (0x162E): total instructions in shared programs: 6812514 -> 6812047 (-0.01%) instructions in affected programs: 260253 -> 259786 (-0.18%) helped: 1134 HURT: 449 Broadwell (0x162E) NIR: total instructions in shared programs: 7008390 -> 7014577 (0.09%) instructions in affected programs: 358710 -> 364897 (1.72%) helped: 182 HURT: 1946 GAINED: 12 Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> --- src/glsl/opt_algebraic.cpp | 51 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 837b080..b14f82d 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -582,6 +582,57 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) swizzle_if_required(ir, ir->operands[i ^ 1]), ir_constant::zero(ir, ir->type)); } + + /* If one of the multiplicands is an ir_triop_csel and the the possible + * results of that csel are limited to {-1, 0, 1}, we can distribute the + * multiply over the csel to eliminate the multiply: + * + * x * mix( 0, 1, condition) => mix( 0, x, condition) + * x * mix( 0, -1, condition) => mix( 0, -x, condition) + * x * mix( 1, 0, condition) => mix( x, 0, condition) + * x * mix( 1, -1, condition) => mix( x, -x, condition) + * x * mix(-1, 0, condition) => mix(-x, 0, condition) + * x * mix(-1, 1, condition) => mix(-x, x, condition) + * + * This assumes that negation is free. + */ + for (unsigned i = 0; i < 2; i++) { + if (op_expr[i] == NULL || op_expr[i]->operation != ir_triop_csel) + continue; + + ir_constant *const c[2] = { + op_expr[i]->operands[1]->as_constant(), + op_expr[i]->operands[2]->as_constant() + }; + + if (c[0] == NULL || c[1] == NULL) + continue; + + if (!is_vec_one(c[0]) && !is_vec_zero(c[0]) && + !is_vec_negative_one(c[0])) + continue; + + if (!is_vec_one(c[1]) && !is_vec_zero(c[1]) && + !is_vec_negative_one(c[1])) + continue; + + /* We now know that the ir_triop_csel is compatible with the + * optimization. Assign the other multiplicand to a temporary + * variable and rewrite the csel. + */ + ir_variable *const temp = + new(mem_ctx) ir_variable(ir->type, + "mul_over_csel", + ir_var_temporary); + + base_ir->insert_before(temp); + ir_assignment *assignment = assign(temp, ir->operands[i ^ 1]); + base_ir->insert_before(assignment); + + return csel(swizzle_if_required(ir, op_expr[i]->operands[0]), + swizzle_if_required(ir, handle_expression(mul(c[0], temp))), + swizzle_if_required(ir, handle_expression(mul(c[1], temp)))); + } break; case ir_binop_div: -- 2.1.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev