On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick <i...@freedesktop.org> wrote: > From: Ian Romanick <ian.d.roman...@intel.com> > > Espically on platforms that do not natively generate 0u and ~0u for > Boolean results, we generate a lot of sequences where a CMP is > followed by an AND with 1. emit_bool_to_cond_code does this, for > example. On ILK, this results in a sequence like: > > add(8) g3<1>F g8<8,8,1>F -g4<0,1,0>F > cmp.l.f0(8) g3<1>D g3<8,8,1>F 0F > and.nz.f0(8) null g3<8,8,1>D 1D > (+f0) iff(8) Jump: 6 > > The AND.nz is obviously redundant. By propagating the cmod, we can > instead generate > > add.l.f0(8) null g8<8,8,1>F -g4<0,1,0>F > (+f0) iff(8) Jump: 6 > > Existing code already handles the propagation from the CMP to the ADD. > > Shader-db results: > > GM45 (0x2A42): > total instructions in shared programs: 3550829 -> 3550788 (-0.00%) > instructions in affected programs: 10028 -> 9987 (-0.41%) > helped: 24 > > Iron Lake (0x0046): > total instructions in shared programs: 4993146 -> 4993105 (-0.00%) > instructions in affected programs: 9675 -> 9634 (-0.42%) > helped: 24 > > Ivy Bridge (0x0166): > total instructions in shared programs: 6291870 -> 6291794 (-0.00%) > instructions in affected programs: 17914 -> 17838 (-0.42%) > helped: 48 > > Haswell (0x0426): > total instructions in shared programs: 5779256 -> 5779180 (-0.00%) > instructions in affected programs: 16694 -> 16618 (-0.46%) > helped: 48 > > Broadwell (0x162E): > total instructions in shared programs: 6823088 -> 6823014 (-0.00%) > instructions in affected programs: 15824 -> 15750 (-0.47%) > helped: 46 > > No chage on Sandy Bridge or on any platform when NIR is used.
typo: change > > Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> > Cc: Matt Turner <matts...@gmail.com> > --- > .../drivers/dri/i965/brw_fs_cmod_propagation.cpp | 32 > +++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > index d0ca2f9..6df3d45 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > @@ -57,7 +57,8 @@ opt_cmod_propagation_local(bblock_t *block) > foreach_inst_in_block_reverse_safe(fs_inst, inst, block) { > ip--; > > - if ((inst->opcode != BRW_OPCODE_CMP && > + if ((inst->opcode != BRW_OPCODE_AND && > + inst->opcode != BRW_OPCODE_CMP && > inst->opcode != BRW_OPCODE_MOV) || > inst->predicate != BRW_PREDICATE_NONE || > !inst->dst.is_null() || > @@ -65,6 +66,19 @@ opt_cmod_propagation_local(bblock_t *block) > inst->src[0].abs) > continue; > > + /* Only an AND.NZ can be propagated. Many AND.Z instructions are > + * generated (for ir_unop_not in fs_visitor::emit_bool_to_cond_code). > + * Propagating those would require inverting the condition on the CMP. > + * This changes both the flag value and the register destination of the > + * CMP. That result may be used elsewhere, so we can't change its > value > + * on a whim. > + */ > + if (inst->opcode == BRW_OPCODE_AND && > + !(inst->src[1].is_one() && > + inst->conditional_mod == BRW_CONDITIONAL_NZ && > + !inst->src[0].negate)) I had a hard time reading this condition, but I think it is right. > + continue; > + > if (inst->opcode == BRW_OPCODE_CMP && !inst->src[1].is_zero()) > continue; > > @@ -80,6 +94,22 @@ opt_cmod_propagation_local(bblock_t *block) > scan_inst->dst.reg_offset != inst->src[0].reg_offset) > break; > > + /* This must be done before the dst.type check because the result > + * type of the AND will always be D, but the result of the CMP > + * could be anything. The assumption is that the AND is just > + * figuring out what the result of the previous comparison was > + * instead of doing a new comparison with a different type. > + */ > + if (inst->opcode == BRW_OPCODE_AND) { > + if (scan_inst->opcode == BRW_OPCODE_CMP && > + scan_inst->writes_flag()) { CMPs will always writes the flag, so no need to check that. I guess you could assert it or something if you wanted. This looks good... we just need some additional unit tests :) - Test that you can remove the AND.NZ after a CMP null - Test that AND.Z isn't removed I don't think there's any other interesting case to test? _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev