From: Ian Romanick <ian.d.roman...@intel.com> Espically on platforms that do not natively generate 0u and ~0u for Boolean results, we generate a lot of sequences where a CMP is followed by an AND with 1. emit_bool_to_cond_code does this, for example. On ILK, this results in a sequence like:
add(8) g3<1>F g8<8,8,1>F -g4<0,1,0>F cmp.l.f0(8) g3<1>D g3<8,8,1>F 0F and.nz.f0(8) null g3<8,8,1>D 1D (+f0) iff(8) Jump: 6 The AND.nz is obviously redundant. By propagating the cmod, we can instead generate add.l.f0(8) null g8<8,8,1>F -g4<0,1,0>F (+f0) iff(8) Jump: 6 Existing code already handles the propagation from the CMP to the ADD. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3542267 -> 3541013 (-0.04%) instructions in affected programs: 169385 -> 168131 (-0.74%) helped: 684 HURT: 0 GAINED: 0 LOST: 0 Iron Lake (0x0046): total instructions in shared programs: 4864611 -> 4863357 (-0.03%) instructions in affected programs: 166050 -> 164796 (-0.76%) helped: 684 HURT: 0 GAINED: 0 LOST: 0 Sandy Bridge (0x0116): total instructions in shared programs: 6853550 -> 6853550 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 GAINED: 0 LOST: 0 Ivy Bridge (0x0166): total instructions in shared programs: 6324560 -> 6324484 (-0.00%) instructions in affected programs: 18283 -> 18207 (-0.42%) helped: 48 HURT: 0 GAINED: 0 LOST: 0 Haswell (0x0426): total instructions in shared programs: 5952024 -> 5951948 (-0.00%) instructions in affected programs: 18208 -> 18132 (-0.42%) helped: 48 HURT: 0 GAINED: 0 LOST: 0 Broadwell (0x162E): total instructions in shared programs: 7040944 -> 7040870 (-0.00%) instructions in affected programs: 17324 -> 17250 (-0.43%) helped: 46 HURT: 0 GAINED: 0 LOST: 0 Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> --- .../drivers/dri/i965/brw_fs_cmod_propagation.cpp | 26 +++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp index c6384ab..6d3a2f5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp @@ -57,7 +57,8 @@ opt_cmod_propagation_local(fs_visitor *v, bblock_t *block) foreach_inst_in_block_reverse_safe(fs_inst, inst, block) { ip--; - if ((inst->opcode != BRW_OPCODE_CMP && + if ((inst->opcode != BRW_OPCODE_AND && + inst->opcode != BRW_OPCODE_CMP && inst->opcode != BRW_OPCODE_MOV) || inst->predicate != BRW_PREDICATE_NONE || !inst->dst.is_null() || @@ -65,6 +66,19 @@ opt_cmod_propagation_local(fs_visitor *v, bblock_t *block) inst->src[0].abs) continue; + /* Only an AND.NZ can be propagated. Many AND.Z instructions are + * generated (for ir_unop_not in fs_visitor::emit_bool_to_cond_code). + * Propagating those would require inverting the condition on the CMP. + * This changes both the flag value and the register destination of the + * CMP. That result may be used elsewhere, so we can't change its value + * on a whim. + */ + if (inst->opcode == BRW_OPCODE_AND && + !(inst->src[1].is_one() && + inst->conditional_mod == BRW_CONDITIONAL_NZ && + !inst->src[0].negate)) + continue; + if (inst->opcode == BRW_OPCODE_CMP && !inst->src[1].is_zero()) continue; @@ -80,6 +94,16 @@ opt_cmod_propagation_local(fs_visitor *v, bblock_t *block) scan_inst->dst.reg_offset != inst->src[0].reg_offset) break; + if (inst->opcode == BRW_OPCODE_AND) { + if (scan_inst->opcode == BRW_OPCODE_CMP && + scan_inst->writes_flag()) { + inst->remove(block); + progress = true; + } + + break; + } + /* If the instruction generating inst's source also wrote the * flag, and inst is doing a simple .nz comparison, then inst * is redundant - the appropriate value is already in the flag -- 2.1.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev