On Curro's suggestion, I tried reproducing the problem that the patch fixes. Not only could I not reproduce it, but it seems like the original workaround that Jason added back when he reworked the fs backend ages ago is unnecessary at least on my ivybridge and broadwell machines. Since reproducing it involves specific register-allocation choices, I hacked the fs visitor to skip emit_nir_code() and instead produce a specific program with some hardcoded register assignments, then picked a random piglit test (tests/shaders/glsl-fs-abs-neg.shader_test) and ran it with INTEL_DEBUG=no8. The resulting assembly looked like:
Native code for unnamed fragment shader GLSL3 SIMD16 shader: 7 instructions. 0 loops. 58 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 112 to 96 bytes (14%) START B0 mov(16) g10<1>F -1F { align1 1H }; mov(16) g122<1>F 1F { align1 1H }; mov(16) g124<1>F [0F, 0F, 0F, 0F]VF { align1 1H compacted }; mov(16) g126<1>F 1F { align1 1H }; add(16) g11<1>F g10<8,8,1>F 1F { align1 1H }; mov(16) g120<1>F g11<8,8,1>F { align1 1H compacted }; sendc(16) null<1>UW g120<8,8,1>F render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT }; END B0 If the comment Jason added is correct, then add(16) g11<1>F g10<8,8,1>F 1F { align1 1H }; will get expanded into add(8) g11<1>F g10<8,8,1>F 1F { align1 1Q }; add(8) g12<1>F g11<8,8,1>F 1F { align1 2Q }; And the add will happen twice for g11, making some of the pixels yellow instead of green. But on the machines I tested, this resulted in pure green. I also tried a similar thing for fp64, but there was still no problem. I Also tested the similar problem described in fs_inst::has_source_and_destination_hazard() with this program: Native code for unnamed fragment shader GLSL3 SIMD16 shader: 7 instructions. 0 loops. 56 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 112 to 96 bytes (14%) START B0 mov(1) g10<1>F -1F { align1 WE_all }; mov(16) g122<1>F 1F { align1 1H }; mov(16) g124<1>F [0F, 0F, 0F, 0F]VF { align1 1H compacted }; mov(16) g126<1>F 1F { align1 1H }; add(16) g10<1>F g10<0,1,0>F 1F { align1 1H }; mov(16) g120<1>F g10<8,8,1>F { align1 1H compacted }; sendc(16) null<1>UW g120<8,8,1>F render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT }; END B0 and again the comment seems to be wrong. Also, there were no piglit regressions with the patch reverted, and the only tests broken with the original workarounds reverted are ones that use math instructions which we split to SIMD8 in the generator (but probably shouldn't be). So for now, we can probably drop this patch, and if we can do more testing, then we can probably drop the original workaround or restrict it to older gens. If anyone wants to test this on other platforms than broadwell and ivybridge, the hacks I made are available on my fdo repo on the test-compressed-wa and test-fp64-compressed-wa (the former has two commits, go back to the first to produce the first assembly). On Fri, Apr 29, 2016 at 7:29 AM, Samuel Iglesias Gonsálvez <sigles...@igalia.com> wrote: > From: Connor Abbott <connor.w.abb...@intel.com> > > Work based on registers read/written instead of dispatch_width, so that > the interferences are added for 64-bit sources/destinations as well. > --- > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 > ++++++++++++++++------- > 1 file changed, 48 insertions(+), 19 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > index 2347cd5..f0e96f9 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct > ra_graph *g, > } > } > > +static unsigned > +get_reg_node(const fs_reg& reg, unsigned first_payload_node) > +{ > + switch (reg.file) { > + case VGRF: > + return reg.nr; > + case ARF: > + case FIXED_GRF: > + return first_payload_node + reg.nr; > + case MRF: > + default: > + unreachable("unhandled register file"); > + } > + > + return 0; > +} > + > +static void > +add_reg_interference(struct ra_graph *g, const fs_reg& reg1, > + const fs_reg& reg2, unsigned first_payload_node) > +{ > + if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) && > + (reg2.file == VGRF || reg2.file == ARF || reg2.file == FIXED_GRF)) { > + ra_add_node_interference(g, get_reg_node(reg1, first_payload_node), > + get_reg_node(reg2, first_payload_node)); > + } > +} > + > bool > fs_visitor::assign_regs(bool allow_spilling) > { > @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling) > } > } > > - if (dispatch_width > 8) { > - /* In 16-wide dispatch we have an issue where a compressed > - * instruction is actually two instructions executed simultaneiously. > - * It's actually ok to have the source and destination registers be > - * the same. In this case, each instruction over-writes its own > - * source and there's no problem. The real problem here is if the > - * source and destination registers are off by one. Then you can end > - * up in a scenario where the first instruction over-writes the > - * source of the second instruction. Since the compiler doesn't know > - * about this level of granularity, we simply make the source and > - * destination interfere. > - */ > - foreach_block_and_inst(block, fs_inst, inst, cfg) { > - if (inst->dst.file != VGRF) > - continue; > + /* When instructions both read/write more than a single SIMD8 register, we > + * have an issue where an instruction is actually two instructions > executed > + * simultaneiously. It's actually ok to have the source and destination > + * registers be the same. In this case, each instruction over-writes its > + * own source and there's no problem. The real problem here is if the > + * source and destination registers are off by one. Then you can end up > in > + * a scenario where the first instruction over-writes the source of the > + * second instruction. Since the compiler doesn't know about this level > of > + * granularity, we simply make the source and destination interfere. > + */ > + foreach_block_and_inst(block, fs_inst, inst, cfg) { > + if (inst->dst.file != VGRF && > + inst->dst.file != ARF && inst->dst.file != FIXED_GRF) > + continue; > > - for (int i = 0; i < inst->sources; ++i) { > - if (inst->src[i].file == VGRF) { > - ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr); > - } > + if (inst->regs_written <= 1) > + continue; > + > + for (int i = 0; i < inst->sources; ++i) { > + if (inst->regs_read(i) > 1) { > + add_reg_interference(g, inst->dst, inst->src[i], > first_payload_node); > } > } > } > -- > 2.5.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev