On Tue, May 3, 2016 at 3:46 PM, Kenneth Graunke <kenn...@whitecape.org> wrote: > On Tuesday, May 3, 2016 3:13:26 PM PDT Connor Abbott wrote: >> On Tue, May 3, 2016 at 2:52 PM, Kenneth Graunke <kenn...@whitecape.org> > wrote: >> > On Friday, April 29, 2016 1:29:34 PM PDT Samuel Iglesias Gonsálvez wrote: >> >> From: Connor Abbott <connor.w.abb...@intel.com> >> >> >> >> Work based on registers read/written instead of dispatch_width, so that >> >> the interferences are added for 64-bit sources/destinations as well. >> >> --- >> >> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 +++++++++++++++ >> > +------- >> >> 1 file changed, 48 insertions(+), 19 deletions(-) >> >> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/ > mesa/ >> > drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> index 2347cd5..f0e96f9 100644 >> >> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct >> > ra_graph *g, >> >> } >> >> } >> >> >> >> +static unsigned >> >> +get_reg_node(const fs_reg& reg, unsigned first_payload_node) >> >> +{ >> >> + switch (reg.file) { >> >> + case VGRF: >> >> + return reg.nr; >> >> + case ARF: >> >> + case FIXED_GRF: >> >> + return first_payload_node + reg.nr; >> > >> > This can't possibly be right for ARFs. I think it's OK for FIXED_GRF. >> > >> > I imagine we can just drop ARF. >> > >> >> + case MRF: >> >> + default: >> >> + unreachable("unhandled register file"); >> >> + } >> >> + >> >> + return 0; >> >> +} >> >> + >> >> +static void >> >> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1, >> >> + const fs_reg& reg2, unsigned first_payload_node) >> >> +{ >> >> + if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) > && >> >> + (reg2.file == VGRF || reg2.file == ARF || reg2.file == > FIXED_GRF)) { >> >> + ra_add_node_interference(g, get_reg_node(reg1, > first_payload_node), >> >> + get_reg_node(reg2, first_payload_node)); >> >> + } >> >> +} >> >> + >> >> bool >> >> fs_visitor::assign_regs(bool allow_spilling) >> >> { >> >> @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling) >> >> } >> >> } >> >> >> >> - if (dispatch_width > 8) { >> >> - /* In 16-wide dispatch we have an issue where a compressed >> >> - * instruction is actually two instructions executed > simultaneiously. >> >> - * It's actually ok to have the source and destination registers > be >> >> - * the same. In this case, each instruction over-writes its own >> >> - * source and there's no problem. The real problem here is if the >> >> - * source and destination registers are off by one. Then you can > end >> >> - * up in a scenario where the first instruction over-writes the >> >> - * source of the second instruction. Since the compiler doesn't > know >> >> - * about this level of granularity, we simply make the source and >> >> - * destination interfere. >> >> - */ >> >> - foreach_block_and_inst(block, fs_inst, inst, cfg) { >> >> - if (inst->dst.file != VGRF) >> >> - continue; >> >> + /* When instructions both read/write more than a single SIMD8 > register, >> > we >> >> + * have an issue where an instruction is actually two instructions >> > executed >> >> + * simultaneiously. It's actually ok to have the source and > destination >> >> + * registers be the same. In this case, each instruction over-writes >> > its >> >> + * own source and there's no problem. The real problem here is if > the >> >> + * source and destination registers are off by one. Then you can end > up >> > in >> >> + * a scenario where the first instruction over-writes the source of > the >> >> + * second instruction. Since the compiler doesn't know about this > level >> > of >> >> + * granularity, we simply make the source and destination interfere. >> >> + */ >> >> + foreach_block_and_inst(block, fs_inst, inst, cfg) { >> >> + if (inst->dst.file != VGRF && >> >> + inst->dst.file != ARF && inst->dst.file != FIXED_GRF) >> >> + continue; >> >> >> >> - for (int i = 0; i < inst->sources; ++i) { >> >> - if (inst->src[i].file == VGRF) { >> >> - ra_add_node_interference(g, inst->dst.nr, inst- >>src[i].nr); >> >> - } >> >> + if (inst->regs_written <= 1) >> >> + continue; >> >> + >> >> + for (int i = 0; i < inst->sources; ++i) { >> >> + if (inst->regs_read(i) > 1) { >> >> + add_reg_interference(g, inst->dst, inst->src[i], >> > first_payload_node); >> >> } >> >> } >> >> } >> >> >> > >> > This seems wrong to me. The point of the original code is that SIMD16 >> > instructions internally decode as two SIMD8 instructions, which operate >> > sequentially. So, the first runs, and clobbers the source for the >> > second. >> > >> > I can't imagine that this happens for double floats - I imagine the >> > hardware reads and writes entire DFs in one instruction. >> > >> > I'm not sure why this patch is needed at all. Can we drop it, or do >> > things break? >> >> AFAICT, that's not how things work. Essentially, the HW can't >> read/write more than one SIMD8 register at a time, so even for SIMD8 >> it has to break double reads/writes into two. If you think about it, >> this makes the restriction on horizontal strides not crossing a >> register make a lot more sense -- the HW just uses the horizontal >> stride inside a single read/write, and the vertical stride to figure >> out how to break up an instruction into multiple instructions. To me, >> this theory seemed to be confirmed by the fact that this patch did fix >> things, at least when I wrote it. You could try reverting it and >> seeing if it breaks things now, though. > > Okay, I think I misunderstood. I was thinking this meant it would > somehow break up a DF operation into multiple instructions, each of > which operated on...part of a double...which would be insane. > > But I can definitely see it breaking up something like: > > add(8) g10:DF g10:DF g20:DF > > into > > add(4) g10:DF g10:DF g20:DF > add(4) g11:DF g11:DF g21:DF > > to avoid crossing register boundaries. That makes sense.
Right, I think that's what's happening. > > I'm still curious whether this helps anything today. I can try and run piglit with the patch reverted later (not today), although you're free to try to :) > > I also think this may have some unintended side effects: a send message > usually reads multiple registers, and writes multiple registers, and > I'm not sure that we need them to conflict. (I think the original code > this is replacing was unnecessarily harsh as well...) Right, I think sends should read/write registers atomically... I guess we need to fix this up. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev