This looks good to me. Marek
On Thu, May 8, 2014 at 3:18 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > Previously, ir_unop_any was implemented via a dot-product call, which > uses floating point multiplication and addition. The multiplication was > completely pointless, and the addition can just as well be done with an > or. Since we know that the inputs are booleans, they must already be in > canonical 0/~0 format, and the final SNE can also be avoided. > > Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> > --- > > I need to take this through a full piglit run, but the basic tests seem to > work out as expected. This is the result of a compilation of > fs-op-eq-mat4-mat4: > > FRAG > PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 > DCL OUT[0], COLOR > DCL CONST[0..7] > DCL TEMP[0..4], LOCAL > IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000} > 0: MOV TEMP[0].yzw, IMM[0].xxxx > 1: FSNE TEMP[1], CONST[4], CONST[0] > 2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy > 3: OR TEMP[1].y, TEMP[1].zzzz, TEMP[1].wwww > 4: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy > 5: FSNE TEMP[2], CONST[5], CONST[1] > 6: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy > 7: OR TEMP[2].y, TEMP[2].zzzz, TEMP[2].wwww > 8: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy > 9: FSNE TEMP[3], CONST[6], CONST[2] > 10: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy > 11: OR TEMP[3].y, TEMP[3].zzzz, TEMP[3].wwww > 12: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy > 13: FSNE TEMP[4], CONST[7], CONST[3] > 14: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy > 15: OR TEMP[4].y, TEMP[4].zzzz, TEMP[4].wwww > 16: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy > 17: OR TEMP[1].x, TEMP[1].xxxx, TEMP[4].xxxx <--- > 18: OR TEMP[1].x, TEMP[1], TEMP[3].xxxx <--- > 19: OR TEMP[1].x, TEMP[1], TEMP[2].xxxx <--- > 20: NOT TEMP[1].x, TEMP[1].xxxx > 21: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy > 22: MOV OUT[0], TEMP[0] > 23: END > > The three instructions with arrows are the result of my new logic. I wonder if > it's cause for concern that I'm not setting a swizzle mask on the > src... probably a bit, but it works out here. Is there a "writemask -> > swizzle" converter somewhere? The old instructions would have been > > DP4 TEMP[1], TEMP[1], TEMP[1] > SNE TEMP[1], TEMP[1], IMM[0] ( == 0.0) > > Or something along those lines. While 1 instruction less in TGSI, at least > nv50/nvc0 are scalar and would have had to implement DP4 as > > mul > mul-add > mul-add > mul-add > > versus the much more scalar-friendly OR's (in addition to the final SNE being > gone). > > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 75 > ++++++++++++++++++++---------- > 1 file changed, 51 insertions(+), 24 deletions(-) > > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > index bdee1f4..2afd8fb 100644 > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -1671,30 +1671,57 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir) > case ir_unop_any: { > assert(ir->operands[0]->type->is_vector()); > > - /* After the dot-product, the value will be an integer on the > - * range [0,4]. Zero stays zero, and positive values become 1.0. > - */ > - glsl_to_tgsi_instruction *const dp = > - emit_dp(ir, result_dst, op[0], op[0], > - ir->operands[0]->type->vector_elements); > - if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB && > - result_dst.type == GLSL_TYPE_FLOAT) { > - /* The clamping to [0,1] can be done for free in the fragment > - * shader with a saturate. > - */ > - dp->saturate = true; > - } else if (result_dst.type == GLSL_TYPE_FLOAT) { > - /* Negating the result of the dot-product gives values on the > range > - * [-4, 0]. Zero stays zero, and negative values become 1.0. > This > - * is achieved using SLT. > - */ > - st_src_reg slt_src = result_src; > - slt_src.negate = ~slt_src.negate; > - emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, > st_src_reg_for_float(0.0)); > - } > - else { > - /* Use SNE 0 if integers are being used as boolean values. */ > - emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, > st_src_reg_for_int(0)); > + if (native_integers) { > + st_src_reg accum = op[0]; > + accum.swizzle = SWIZZLE_XXXX; > + /* OR all the components together, since they should be either 0 or > ~0 > + */ > + assert(ir->operands[0]->type->is_boolean()); > + switch (ir->operands[0]->type->vector_elements) { > + case 4: > + op[0].swizzle = SWIZZLE_WWWW; > + emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]); > + accum = st_src_reg(result_dst); > + /* fallthrough */ > + case 3: > + op[0].swizzle = SWIZZLE_ZZZZ; > + emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]); > + accum = st_src_reg(result_dst); > + /* fallthrough */ > + case 2: > + op[0].swizzle = SWIZZLE_YYYY; > + emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]); > + break; > + default: > + assert(!"Unexpected vector size"); > + break; > + } > + } else { > + /* After the dot-product, the value will be an integer on the > + * range [0,4]. Zero stays zero, and positive values become 1.0. > + */ > + glsl_to_tgsi_instruction *const dp = > + emit_dp(ir, result_dst, op[0], op[0], > + ir->operands[0]->type->vector_elements); > + if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB && > + result_dst.type == GLSL_TYPE_FLOAT) { > + /* The clamping to [0,1] can be done for free in the fragment > + * shader with a saturate. > + */ > + dp->saturate = true; > + } else if (result_dst.type == GLSL_TYPE_FLOAT) { > + /* Negating the result of the dot-product gives values on the > range > + * [-4, 0]. Zero stays zero, and negative values become 1.0. > This > + * is achieved using SLT. > + */ > + st_src_reg slt_src = result_src; > + slt_src.negate = ~slt_src.negate; > + emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, > st_src_reg_for_float(0.0)); > + } > + else { > + /* Use SNE 0 if integers are being used as boolean values. */ > + emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, > st_src_reg_for_int(0)); > + } > } > break; > } > -- > 1.8.3.2 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev