Previously, ir_unop_any was implemented via a dot-product call, which uses floating point multiplication and addition. The multiplication was completely pointless, and the addition can just as well be done with an or. Since we know that the inputs are booleans, they must already be in canonical 0/~0 format, and the final SNE can also be avoided.
Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> --- I need to take this through a full piglit run, but the basic tests seem to work out as expected. This is the result of a compilation of fs-op-eq-mat4-mat4: FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL OUT[0], COLOR DCL CONST[0..7] DCL TEMP[0..4], LOCAL IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000} 0: MOV TEMP[0].yzw, IMM[0].xxxx 1: FSNE TEMP[1], CONST[4], CONST[0] 2: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy 3: OR TEMP[1].y, TEMP[1].zzzz, TEMP[1].wwww 4: OR TEMP[1].x, TEMP[1].xxxx, TEMP[1].yyyy 5: FSNE TEMP[2], CONST[5], CONST[1] 6: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy 7: OR TEMP[2].y, TEMP[2].zzzz, TEMP[2].wwww 8: OR TEMP[2].x, TEMP[2].xxxx, TEMP[2].yyyy 9: FSNE TEMP[3], CONST[6], CONST[2] 10: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy 11: OR TEMP[3].y, TEMP[3].zzzz, TEMP[3].wwww 12: OR TEMP[3].x, TEMP[3].xxxx, TEMP[3].yyyy 13: FSNE TEMP[4], CONST[7], CONST[3] 14: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy 15: OR TEMP[4].y, TEMP[4].zzzz, TEMP[4].wwww 16: OR TEMP[4].x, TEMP[4].xxxx, TEMP[4].yyyy 17: OR TEMP[1].x, TEMP[1].xxxx, TEMP[4].xxxx <--- 18: OR TEMP[1].x, TEMP[1], TEMP[3].xxxx <--- 19: OR TEMP[1].x, TEMP[1], TEMP[2].xxxx <--- 20: NOT TEMP[1].x, TEMP[1].xxxx 21: AND TEMP[0].x, TEMP[1].xxxx, IMM[0].yyyy 22: MOV OUT[0], TEMP[0] 23: END The three instructions with arrows are the result of my new logic. I wonder if it's cause for concern that I'm not setting a swizzle mask on the src... probably a bit, but it works out here. Is there a "writemask -> swizzle" converter somewhere? The old instructions would have been DP4 TEMP[1], TEMP[1], TEMP[1] SNE TEMP[1], TEMP[1], IMM[0] ( == 0.0) Or something along those lines. While 1 instruction less in TGSI, at least nv50/nvc0 are scalar and would have had to implement DP4 as mul mul-add mul-add mul-add versus the much more scalar-friendly OR's (in addition to the final SNE being gone). src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 75 ++++++++++++++++++++---------- 1 file changed, 51 insertions(+), 24 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index bdee1f4..2afd8fb 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -1671,30 +1671,57 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir) case ir_unop_any: { assert(ir->operands[0]->type->is_vector()); - /* After the dot-product, the value will be an integer on the - * range [0,4]. Zero stays zero, and positive values become 1.0. - */ - glsl_to_tgsi_instruction *const dp = - emit_dp(ir, result_dst, op[0], op[0], - ir->operands[0]->type->vector_elements); - if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB && - result_dst.type == GLSL_TYPE_FLOAT) { - /* The clamping to [0,1] can be done for free in the fragment - * shader with a saturate. - */ - dp->saturate = true; - } else if (result_dst.type == GLSL_TYPE_FLOAT) { - /* Negating the result of the dot-product gives values on the range - * [-4, 0]. Zero stays zero, and negative values become 1.0. This - * is achieved using SLT. - */ - st_src_reg slt_src = result_src; - slt_src.negate = ~slt_src.negate; - emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0)); - } - else { - /* Use SNE 0 if integers are being used as boolean values. */ - emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0)); + if (native_integers) { + st_src_reg accum = op[0]; + accum.swizzle = SWIZZLE_XXXX; + /* OR all the components together, since they should be either 0 or ~0 + */ + assert(ir->operands[0]->type->is_boolean()); + switch (ir->operands[0]->type->vector_elements) { + case 4: + op[0].swizzle = SWIZZLE_WWWW; + emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]); + accum = st_src_reg(result_dst); + /* fallthrough */ + case 3: + op[0].swizzle = SWIZZLE_ZZZZ; + emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]); + accum = st_src_reg(result_dst); + /* fallthrough */ + case 2: + op[0].swizzle = SWIZZLE_YYYY; + emit(ir, TGSI_OPCODE_OR, result_dst, accum, op[0]); + break; + default: + assert(!"Unexpected vector size"); + break; + } + } else { + /* After the dot-product, the value will be an integer on the + * range [0,4]. Zero stays zero, and positive values become 1.0. + */ + glsl_to_tgsi_instruction *const dp = + emit_dp(ir, result_dst, op[0], op[0], + ir->operands[0]->type->vector_elements); + if (this->prog->Target == GL_FRAGMENT_PROGRAM_ARB && + result_dst.type == GLSL_TYPE_FLOAT) { + /* The clamping to [0,1] can be done for free in the fragment + * shader with a saturate. + */ + dp->saturate = true; + } else if (result_dst.type == GLSL_TYPE_FLOAT) { + /* Negating the result of the dot-product gives values on the range + * [-4, 0]. Zero stays zero, and negative values become 1.0. This + * is achieved using SLT. + */ + st_src_reg slt_src = result_src; + slt_src.negate = ~slt_src.negate; + emit(ir, TGSI_OPCODE_SLT, result_dst, slt_src, st_src_reg_for_float(0.0)); + } + else { + /* Use SNE 0 if integers are being used as boolean values. */ + emit(ir, TGSI_OPCODE_SNE, result_dst, result_src, st_src_reg_for_int(0)); + } } break; } -- 1.8.3.2 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev