From: Ian Romanick <ian.d.roman...@intel.com> This can increase register pressure. I think there may be some ways to mitigate this, but that will take more work.
By percentage of instructions reduced, the most helped shader had a block of code like: cmp.z.f0(8) g51<1>D g21<8,8,1>D 1D (+f0) sel(8) g23<1>UD g4.1<0,1,0>UD g4<0,1,0>UD cmp.z.f0(8) g24<1>D g21<8,8,1>D 2D (+f0) sel(8) g25<1>UD g4.2<0,1,0>UD g23<8,8,1>UD cmp.z.f0(8) g26<1>D g21<8,8,1>D 3D (+f0) sel(8) g17<1>UD g4.3<0,1,0>UD g25<8,8,1>UD cmp.nz.f0(8) null<1>D g51<8,8,1>D 0D (+f0) sel(8) g28<1>UD g4.5<0,1,0>UD g4.4<0,1,0>UD cmp.nz.f0(8) null<1>D g24<8,8,1>D 0D (+f0) sel(8) g29<1>UD g4.6<0,1,0>UD g28<8,8,1>UD cmp.nz.f0(8) null<1>D g26<8,8,1>D 0D (+f0) sel(8) g18<1>UD g4.7<0,1,0>UD g29<8,8,1>UD cmp.nz.f0(8) null<1>D g51<8,8,1>D 0D (+f0) sel(8) g31<1>UD g5.1<0,1,0>UD g5<0,1,0>UD cmp.nz.f0(8) null<1>D g24<8,8,1>D 0D (+f0) sel(8) g32<1>UD g5.2<0,1,0>UD g31<8,8,1>UD cmp.nz.f0(8) null<1>D g26<8,8,1>D 0D (+f0) sel(8) g19<1>UD g5.3<0,1,0>UD g32<8,8,1>UD All of the cmp.nz instructions could be eliminated by just moving the selects. Interestingly, this shader ends up looking like: cmp.z.f0(8) g51<1>D g28<8,8,1>D 1D (+f0) sel(8) g32<1>UD g4.1<0,1,0>UD g4<0,1,0>UD csel.nz(8) g21<1>F g4.5<0,1,0>F g4.4<0,1,0>F g51<4,4,1>F csel.nz(8) g31<1>F g5.1<0,1,0>F g5.0<0,1,0>F g51<4,4,1>F cmp.z.f0(8) g24<1>D g28<8,8,1>D 2D (+f0) sel(8) g25<1>UD g4.2<0,1,0>UD g32<8,8,1>UD csel.nz(8) g29<1>F g4.6<0,1,0>F g21<4,4,1>F g24<4,4,1>F csel.nz(8) g23<1>F g5.2<0,1,0>F g31<4,4,1>F g24<4,4,1>F cmp.z.f0(8) g26<1>D g28<8,8,1>D 3D (+f0) sel(8) g17<1>UD g4.3<0,1,0>UD g25<8,8,1>UD csel.nz(8) g18<1>F g4.7<0,1,0>F g29<4,4,1>F g26<4,4,1>F csel.nz(8) g19<1>F g5.3<0,1,0>F g23<4,4,1>F g26<4,4,1>F At this point, we may as well convert the CSELs back to regular SELs. Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14398211 -> 14395514 (-0.02%) instructions in affected programs: 312588 -> 309891 (-0.86%) helped: 1173 HURT: 1 helped stats (abs) min: 1 max: 66 x̄: 4.04 x̃: 3 helped stats (rel) min: 0.13% max: 13.95% x̄: 1.80% x̃: 1.36% HURT stats (abs) min: 2046 max: 2046 x̄: 2046.00 x̃: 2046 HURT stats (rel) min: 33.71% max: 33.71% x̄: 33.71% x̃: 33.71% 95% mean confidence interval for instructions value: -5.73 1.14 95% mean confidence interval for instructions %-change: -1.87% -1.67% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 532959388 -> 532813589 (-0.03%) cycles in affected programs: 4824963 -> 4679164 (-3.02%) helped: 963 HURT: 175 helped stats (abs) min: 1 max: 18800 x̄: 175.14 x̃: 85 helped stats (rel) min: 0.02% max: 57.14% x̄: 7.96% x̃: 5.77% HURT stats (abs) min: 1 max: 13496 x̄: 130.63 x̃: 14 HURT stats (rel) min: 0.05% max: 60.02% x̄: 2.82% x̃: 1.20% 95% mean confidence interval for cycles value: -178.35 -77.89 95% mean confidence interval for cycles %-change: -6.83% -5.77% Cycles are helped. total spills in shared programs: 8044 -> 8329 (3.54%) spills in affected programs: 363 -> 648 (78.51%) helped: 0 HURT: 1 total fills in shared programs: 10950 -> 11272 (2.94%) fills in affected programs: 512 -> 834 (62.89%) helped: 0 HURT: 1 LOST: 2 GAINED: 2 No changes on pre-Gen8 platforms because they all lack the CSEL instruction. Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> --- There is still quite a bit of potential future work here. A lot of the cases that were helped by this were things like cmp.z.f0 g51F g21F g20F ... lots of instructions cmp.nz.f0 nullD g51D 0D (+f0) sel g28UD g45UD g44UD If this pattern is the only use of the result from the first compare, we could convert this to add.z.f0 g51F -g21F g20F ... lots of instructions csel.nz g28F g45F g44F g51F I also saw a number of cases like and g51D g21D 7D ... cmp.nz.f0 nullD g51D 0D (+f0) sel g28UD g45UD g44UD This pass does not convert those to use CSEL, but, assuming we can determine the denorm behavior, it seems like it could. As alluded in the commit message, we may also want a pass the converts CSEL instructions back into SEL instructions. This may help with register pressure. src/intel/compiler/brw_fs.cpp | 56 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index a4086a8dd7f..f683b4fd4d0 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -2903,13 +2903,54 @@ fs_visitor::opt_peephole_csel() scan_inst->predicate != BRW_PREDICATE_NONE || (scan_inst->src[0].file != VGRF && scan_inst->src[0].file != ATTR && - scan_inst->src[0].file != UNIFORM) || - scan_inst->src[0].type != BRW_REGISTER_TYPE_F) + scan_inst->src[0].file != UNIFORM)) break; if (scan_inst->opcode == BRW_OPCODE_CMP && !scan_inst->src[1].is_zero()) break; + if (scan_inst->opcode == BRW_OPCODE_MOV && + scan_inst->src[0].type != BRW_REGISTER_TYPE_F) + break; + + /* If the comparison is among integer values, it can still work if + * the value being compared is itself the result of a comparison. + * Such a result will either be 0 or 0xffffffff. The latter is + * -NaN, so can only be used with CSEL.Z or CSEL.NZ. + */ + bool cond_is_bool = false; + if (scan_inst->src[0].type != BRW_REGISTER_TYPE_F) { + assert(scan_inst->opcode == BRW_OPCODE_CMP); + + if (scan_inst->conditional_mod != BRW_CONDITIONAL_Z && + scan_inst->conditional_mod != BRW_CONDITIONAL_NZ) + break; + + if (scan_inst->src[0].abs || scan_inst->src[0].negate) + break; + + foreach_inst_in_block_reverse_starting_from(fs_inst, + scan_more_inst, + scan_inst) { + if (regions_overlap(scan_more_inst->dst, + scan_more_inst->size_written, + scan_inst->src[0], + scan_inst->size_read(0))) { + if (scan_more_inst->is_partial_write() || + scan_more_inst->dst.offset != scan_inst->src[0].offset || + scan_more_inst->exec_size != scan_inst->exec_size || + scan_more_inst->opcode != BRW_OPCODE_CMP) + break; + + cond_is_bool = true; + break; + } + } + + if (!cond_is_bool) + break; + } + const brw::fs_builder ibld(this, block, inst); const enum brw_conditional_mod cond = @@ -2923,7 +2964,8 @@ fs_visitor::opt_peephole_csel() csel_inst = ibld.CSEL(inst->dst, inst->src[0], inst->src[1], - scan_inst->src[0], + retype(scan_inst->src[0], + BRW_REGISTER_TYPE_F), cond); } else if (cond == BRW_CONDITIONAL_NZ) { /* Consider the sequence @@ -2945,10 +2987,14 @@ fs_visitor::opt_peephole_csel() csel_inst = ibld.CSEL(inst->dst, inst->src[0], scan_inst->src[0], - scan_inst->src[0], + retype(scan_inst->src[0], + BRW_REGISTER_TYPE_F), cond); - csel_inst->src[1].abs = true; + /* If the condition is a Boolean, then it is either 0 or ~0. + * The abs is unnecessary in this case. + */ + csel_inst->src[1].abs = !cond_is_bool; } if (csel_inst != NULL) { -- 2.14.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev