On 04/11/15 23:49, Matt Turner wrote: > On Wed, Nov 4, 2015 at 1:01 PM, Alejandro Piñeiro <apinhe...@igalia.com> > wrote: >> On 04/11/15 20:13, Matt Turner wrote: >>> On Fri, Oct 23, 2015 at 7:17 AM, Alejandro Piñeiro <apinhe...@igalia.com> >>> wrote: >>>> Equivalent to commit 4eebeb but with sel operations. In this case >>> That commit sha doesn't exist. I suspect you meant commit 8ac3b525c. >> Yes, probably that sha was the one on my wip branches. >> >>>> we select the PredCtrl based on the writemask. >>>> >>>> This change allows cmod propagation to optimize out several >>>> instructions. >>> Okay, so this patch helps a case like: >> I don't think that the case you pointed is a good one to explain why >> this is helping. Because this seems to suggest that this case is without >> the patch but... >> >>> cmp.g.f0.0 vgrf32.0.x:F, vgrf31.xxxx:F, 0.000000F >>> cmp.nz.f0.0 null.x:D, vgrf32.xxxx:D, 0D >> ... this patch is needed to arrive to this point (null destination >> writemask being updated). >>> (+f0.0) sel vgrf2.0.x:F, |vgrf27.xxxx|:F, 0.000000F >>> >>> where our code thinks the SEL reads all four channels of the flag >>> register when it actually only reads .x because of the writemask. >>> >>> Adding something like this to the commit message would be good. >> So if we want to go into the details of why this change helps to reduce >> instructions, what about something like: >> >> "This patch helps on cases like this: >> >> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F >> 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D >> 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD >> >> In this case, cmod propagation can't optimize instruction #2, because >> instructions #1 and #2 have different writemasks, and we can't update >> directly instruction #2 writemask because our code thinks that sel reads >> all four channels of the flag, when it actually only reads .x. >> >> So, with this patch, the previous case becames this: >> >> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F >> 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D >> 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD >> >> Now only the x channel of the flag is used, allowing dead code eliminate >> to update the writemask at the second instruction: >> 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F >> 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D >> 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD >> >> So now cmod propagation can simplify out #2: >> 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F >> 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD >> " >> >> Although not sure if this explanation is too long. >> >>>> Shader-db numbers: >>>> total instructions in shared programs: 6235835 -> 6228008 (-0.13%) >>>> instructions in affected programs: 219850 -> 212023 (-3.56%) >>>> total loops in shared programs: 1979 -> 1979 (0.00%) >>>> helped: 1192 >>>> HURT: 0 >>>> --- >>>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 +++++++++++++++++- >>>> 1 file changed, 17 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>> index 0f04f65..bc86be6 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>> @@ -1437,8 +1437,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) >>>> case nir_op_bcsel: >>>> emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ)); >>>> inst = emit(BRW_OPCODE_SEL, dst, op[1], op[2]); >>>> - inst->predicate = BRW_PREDICATE_NORMAL; >>>> + switch (dst.writemask) { >>>> + case WRITEMASK_X: >>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_X; >>>> + break; >>>> + case WRITEMASK_Y: >>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Y; >>>> + break; >>>> + case WRITEMASK_Z: >>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Z; >>>> + break; >>>> + case WRITEMASK_W: >>>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_W; >>>> + break; >>>> + default: >>>> + inst->predicate = BRW_PREDICATE_NORMAL; >>>> break; >>> This break, and the one two lines below it are indented incorrectly. >> Ok. >> >>>> + } >>>> + break; >>>> >>>> case nir_op_fdot_replicated2: >>>> inst = emit(BRW_OPCODE_DP2, dst, op[0], op[1]); >>>> -- >>>> 2.1.4 >>>> >>> With the commit message fixed/updated and the breaks properly indented, >> As I changed totally the use case and description, I would prefer to >> confirm that I explained myself properly before pushing. > Thanks. Your description is great -- it'll definitely help future > readers to understand exactly why this is needed.
Patch pushed with that description (with minor tweaks). Thanks for the patience. BR -- Alejandro Piñeiro (apinhe...@igalia.com) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev