On Wed, Nov 4, 2015 at 1:01 PM, Alejandro Piñeiro <apinhe...@igalia.com> wrote: > On 04/11/15 20:13, Matt Turner wrote: >> On Fri, Oct 23, 2015 at 7:17 AM, Alejandro Piñeiro <apinhe...@igalia.com> >> wrote: >>> Equivalent to commit 4eebeb but with sel operations. In this case >> That commit sha doesn't exist. I suspect you meant commit 8ac3b525c. > > Yes, probably that sha was the one on my wip branches. > >>> we select the PredCtrl based on the writemask. >>> >>> This change allows cmod propagation to optimize out several >>> instructions. >> Okay, so this patch helps a case like: > > I don't think that the case you pointed is a good one to explain why > this is helping. Because this seems to suggest that this case is without > the patch but... > >> >> cmp.g.f0.0 vgrf32.0.x:F, vgrf31.xxxx:F, 0.000000F >> cmp.nz.f0.0 null.x:D, vgrf32.xxxx:D, 0D > ... this patch is needed to arrive to this point (null destination > writemask being updated). >> (+f0.0) sel vgrf2.0.x:F, |vgrf27.xxxx|:F, 0.000000F >> >> where our code thinks the SEL reads all four channels of the flag >> register when it actually only reads .x because of the writemask. >> >> Adding something like this to the commit message would be good. > > So if we want to go into the details of why this change helps to reduce > instructions, what about something like: > > "This patch helps on cases like this: > > 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F > 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D > 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD > > In this case, cmod propagation can't optimize instruction #2, because > instructions #1 and #2 have different writemasks, and we can't update > directly instruction #2 writemask because our code thinks that sel reads > all four channels of the flag, when it actually only reads .x. > > So, with this patch, the previous case becames this: > > 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F > 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D > 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD > > Now only the x channel of the flag is used, allowing dead code eliminate > to update the writemask at the second instruction: > 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F > 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D > 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD > > So now cmod propagation can simplify out #2: > 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F > 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD > " > > Although not sure if this explanation is too long. > >> >>> Shader-db numbers: >>> total instructions in shared programs: 6235835 -> 6228008 (-0.13%) >>> instructions in affected programs: 219850 -> 212023 (-3.56%) >>> total loops in shared programs: 1979 -> 1979 (0.00%) >>> helped: 1192 >>> HURT: 0 >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 +++++++++++++++++- >>> 1 file changed, 17 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> index 0f04f65..bc86be6 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>> @@ -1437,8 +1437,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) >>> case nir_op_bcsel: >>> emit(CMP(dst_null_d(), op[0], src_reg(0), BRW_CONDITIONAL_NZ)); >>> inst = emit(BRW_OPCODE_SEL, dst, op[1], op[2]); >>> - inst->predicate = BRW_PREDICATE_NORMAL; >>> + switch (dst.writemask) { >>> + case WRITEMASK_X: >>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_X; >>> + break; >>> + case WRITEMASK_Y: >>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Y; >>> + break; >>> + case WRITEMASK_Z: >>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_Z; >>> + break; >>> + case WRITEMASK_W: >>> + inst->predicate = BRW_PREDICATE_ALIGN16_REPLICATE_W; >>> + break; >>> + default: >>> + inst->predicate = BRW_PREDICATE_NORMAL; >>> break; >> This break, and the one two lines below it are indented incorrectly. > > Ok. > >>> + } >>> + break; >>> >>> case nir_op_fdot_replicated2: >>> inst = emit(BRW_OPCODE_DP2, dst, op[0], op[1]); >>> -- >>> 2.1.4 >>> >> With the commit message fixed/updated and the breaks properly indented, > > As I changed totally the use case and description, I would prefer to > confirm that I explained myself properly before pushing.
Thanks. Your description is great -- it'll definitely help future readers to understand exactly why this is needed. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev