On 14/10/15 10:15, Francisco Jerez wrote: > Alejandro Piñeiro <apinhe...@igalia.com> writes: > >> On 13/10/15 23:36, Matt Turner wrote: >>> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro <apinhe...@igalia.com> >>> wrote: >>>> On 13/10/15 03:10, Matt Turner wrote: >>>>> Looks like this is causing an intermittent failure on HSW in our >>>>> Jenkins system (but I'm not able to reproduce locally) and a >>>>> consistent failure on G45. I'll investigate that. >>>> Ok, will hold on, and meanwhile I will try to reproduce the problem on >>>> HSW. Unfortunately I don't have any G45 available, so I will not be able >>>> to help on that. Thanks for taking a look there. >>> Okay, I think I see what's going on: >>> >>> --- good 2015-10-13 13:34:40.753357253 -0700 >>> +++ bad 2015-10-13 13:36:06.114290094 -0700 >>> -and(8) g6<1>.xD g6<4,4,1>.xD 1D { align16 >>> }; >>> -cmp.nz.f0(8) null -g6<4,4,1>.xD 0D { align16 >>> }; >>> +and.nz.f0(8) g6<1>.xD g6<4,4,1>.xD 1D { align16 >>> }; >>> >>> We're propagating a .nz conditional mod from a CMP with a null dest >>> (that has a writemask of .xyzw) to an AND that has a writemask of only >>> .x, so only the .x channel of the flag is now being updated. >> That is really common. And that is the idea behind the first patch of >> the series that changes how if's are emitted, and allowing to optimize >> if scan_inst->writemask is WRITEMASK_X. I will use the piglit test >> glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this. >> >> If we use current master, we get the following assembly: >> 1: cmp.z.f0(8) g20<1>.xD g12<4,4,1>.xD g19<4,4,1>.xD { >> align16 1Q compacted }; >> 2: mov.nz.f0(8) null g20<4,4,1>.xD { >> align16 1Q }; >> 3: (+f0) if(8) JIP: 6 UIP: 6 { >> align16 1Q }; >> >> If we just simplify out the mov.nz at #2, the test would fail, because >> as in the example you mentioned, null has a XYZW writemask, but g19 a >> writemask X. So this optimized version, f0 only have x updated, and #3 >> is doing a normal if, that uses all the channels. >> >> But right now the if just needs to check against one component, the x. >> So that is the reason (among others) that my patch 1 changes how if's >> are emitted. So, using both my patches 1 and 2, the assembly outcome >> would be the following: >> 1: cmp.z.f0(8) g20<1>.xD g12<4,4,1>.xD g19<4,4,1>.xD { >> align16 1Q compacted }; >> 2: (+f0.x) if(8) JIP: 6 UIP: 6 { >> align16 1Q }; >> >> It is true that now #1 only updates the x channel. But the if at #2 only >> use that channel now, so the test still passes. >> >>> I think for now the thing to do is add >>> >>> (inst->dst.writemask & ~scan_inst->dst.writemask) != 0) >> This would bail out basically all if's. >> > I agree with Matt that this check is necessary, otherwise > cmod_propagation will replace the pair of instruction with another > instruction which doesn't have equivalent semantics.
Ok. > If you still want > cases like the one you pasted below involving an if conditional that > only reads the x components to be simplified, I suggest you have a look > at dead code elimination and fix it so that writemask bits of an > instruction that writes a flag register (e.g. the "mov.nz.f0") are > turned off for components which are determined to be dead, kind of like > is done already for the vec4 components of GRFs. > > With that change the assembly from your example would look like this > after DCE: > > | 1: cmp.z.f0(8) g20<1>.xD g12<4,4,1>.xD g19<4,4,1>.xD { > align16 1Q compacted }; > | 2: mov.nz.f0(8) null.x g20<4,4,1>.xD { > align16 1Q }; > | 3: (+f0.x) if(8) JIP: 6 UIP: 6 { > align16 1Q }; > > Which cmod propagation would be allowed to simplify further. Ok, thanks, that sounds like a good idea. Will work on that, and send a new patch to this series. > >> So guessing about what is happening here, as I mentioned, we are >> allowing to pass that condition if scan_inst->dst.writemask equal to >> WRITEMASK_X assuming that they come from an if, so we don't need to >> update all f0 channels. But as Jason mentioned in our off-list chat, >> that would not be the case if we are dealing with a sel. So something >> like this: >> 1: cmp.z.f0(8) g20<1>.xD g12<4,4,1>.xD g19<4,4,1>.xD { >> align16 1Q compacted }; >> 2: mov.nz.f0(8) null g20<4,4,1>.xD { >> align16 1Q }; >> 3: (+f0.x) if(8) JIP: 6 UIP: 6 { >> align16 1Q }; >> >> #2 can be optimized out. But if we replace that if for an sel at #3, we >> can't. >> >> If that is the case, in addition to check scan_inst and inst, we would >> need to check in which kind of instruction the flag register is used later. >> >> But as I said, Im just guessing until I see the full failing test. >> >>> to the list of conditions under which we bail out. That is, if the >>> instruction we want to propagate the cmod onto writes fewer channels, >>> we can't do the optimization. >>> >>> With that, the block should look like: >>> >>> if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) || >>> scan_inst->dst.reg_offset != inst->src[0].reg_offset || >>> (scan_inst->dst.writemask != WRITEMASK_X && >>> scan_inst->dst.writemask != WRITEMASK_XYZW) || >>> (scan_inst->dst.writemask == WRITEMASK_XYZW && >>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) || >>> (inst->dst.writemask & ~scan_inst->dst.writemask) != 0) >>> break; >>> >>> The good news is that, unless I've done something wrong, this doesn't >>> affect any shaders in shader-db on ILK or HSW (I only tested those >>> two, but I expect the results are the same everywhere). Apparently >>> this is a pretty rare case. >> Are you sure? I have made a run adding your condition, and now comparing >> master vs having the optimization I get this: >> total instructions in shared programs: 6240631 -> 6240471 (-0.00%) >> instructions in affected programs: 18965 -> 18805 (-0.84%) >> helped: 160 >> HURT: 0 >> >> That is a really small gain. Or put in other way, if we compare the >> conditions I have on the original patches vs adding the condition you >> are proposing, I get this: >> >> total instructions in shared programs: 6223900 -> 6240471 (0.27%) >> instructions in affected programs: 477537 -> 494108 (3.47%) >> helped: 0 >> HURT: 3047 >> >>> With that change, my R-b still stands, though we should have a unit >>> test for this case as well in 3/5. Taking into account Francisco's email, I will add that condition to the cmod propagation optimization, adding your review, and I will update the unit tests handling this cases. BR -- Alejandro Piñeiro (apinhe...@igalia.com) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev