On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick <i...@freedesktop.org> wrote: > From: Ian Romanick <ian.d.roman...@intel.com> > > On platforms that do not natively generate 0u and ~0u for Boolean > results, b2f expressions that look like > > f = b2f(expr cmp 0) > > will generate better code by pretending the expression is > > f = ir_triop_sel(0.0, 1.0, expr cmp 0) > > This is because the last instruction of "expr" can generate the > condition code for the "cmp 0". This avoids having to do the "-(b & 1)" > trick to generate 0u or ~0u for the Boolean result. This means code like > > mov(16) g16<1>F 1F > mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F > (+f0) sel(16) m6<1>F g16<8,8,1>F 0F > > will be generated instead of > > mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F > cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F
Presumably this g4 should be g2? > and(16) g4<1>D g2<8,8,1>D 1D > and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD > > v2: When the comparison is either == 0.0 or != 0.0 use the knowledge > that the true (or false) case already results in zero would allow better > code generation by possibly avoiding a load-immediate instruction. > > v3: Apply the optimization even when neither comparitor is zero. > > Shader-db results: > > GM45 (0x2A42): > total instructions in shared programs: 3551002 -> 3550829 (-0.00%) > instructions in affected programs: 33269 -> 33096 (-0.52%) > helped: 121 > > Iron Lake (0x0046): > total instructions in shared programs: 4993327 -> 4993146 (-0.00%) > instructions in affected programs: 34199 -> 34018 (-0.53%) > helped: 129 > > No change on other platforms. > > Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> > Cc: Tapani Palli <tapani.pa...@intel.com> > --- > src/mesa/drivers/dri/i965/brw_fs.h | 2 + > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 > +++++++++++++++++++++++++-- > 2 files changed, 99 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index d9d5858..075e90c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -307,6 +307,7 @@ public: > const fs_reg &a); > void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg > &dst, > const fs_reg &src0, const fs_reg &src1); > + bool try_emit_b2f_of_comparison(ir_expression *ir); > bool try_emit_saturate(ir_expression *ir); > bool try_emit_line(ir_expression *ir); > bool try_emit_mad(ir_expression *ir); > @@ -317,6 +318,7 @@ public: > bool opt_saturate_propagation(); > bool opt_cmod_propagation(); > void emit_bool_to_cond_code(ir_rvalue *condition); > + void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]); > void emit_if_gen6(ir_if *ir); > void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg, > uint32_t spill_offset, int count); > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index 3025a9d..3d79796 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir) > return true; > } > > +bool > +fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir) > +{ > + /* On platforms that do not natively generate 0u and ~0u for Boolean > + * results, b2f expressions that look like > + * > + * f = b2f(expr cmp 0) > + * > + * will generate better code by pretending the expression is > + * > + * f = ir_triop_csel(0.0, 1.0, expr cmp 0) > + * > + * This is because the last instruction of "expr" can generate the > + * condition code for the "cmp 0". This avoids having to do the "-(b & > 1)" > + * trick to generate 0u or ~0u for the Boolean result. This means code > like > + * > + * mov(16) g16<1>F 1F > + * mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F > + * (+f0) sel(16) m6<1>F g16<8,8,1>F 0F > + * > + * will be generated instead of > + * > + * mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F > + * cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F > + * and(16) g4<1>D g2<8,8,1>D 1D > + * and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD > + * > + * When the comparison is either == 0.0 or != 0.0 using the knowledge that > + * the true (or false) case already results in zero would allow better > code > + * generation by possibly avoiding a load-immediate instruction. > + */ > + ir_expression *cmp = ir->operands[0]->as_expression(); > + if (cmp == NULL) > + return false; > + > + if (cmp->operation == ir_binop_equal || cmp->operation == > ir_binop_nequal) { > + for (unsigned i = 0; i < 2; i++) { > + ir_constant *c = cmp->operands[i]->as_constant(); > + if (c == NULL || !c->is_zero()) > + continue; > + > + ir_expression *expr = cmp->operands[i ^ 1]->as_expression(); > + if (expr != NULL) { > + fs_reg op[2]; > + > + for (unsigned j = 0; j < 2; j++) { > + cmp->operands[j]->accept(this); > + op[j] = this->result; > + > + resolve_ud_negate(&op[j]); > + } > + > + emit_bool_to_cond_code_of_reg(cmp, op); > + > + /* In this case we know when the condition is true, op[i ^ 1] > + * contains zero. Invert the predicate, use op[i ^ 1] as src0, > + * and immediate 1.0f as src1. > + */ > + this->result = vgrf(ir->type); > + op[i ^ 1].type = BRW_REGISTER_TYPE_F; We just do op[1 - i] in tons of other places. No comment needed to explain 1-i. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev