On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > --- > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24 > ++++++++++++++++++++++- > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++ > 2 files changed, 28 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > index b42f1e5..41e8855 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int > reg_width) > /* Compute the total number of registers across all classes. */ > int ra_reg_count = 0; > for (int i = 0; i < class_count; i++) { > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count; > ra_reg_count += base_reg_count - (class_sizes[i] - 1); > } > + screen->wm_reg_sets[index].class_first_reg[class_count] = ra_reg_count; > > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count); > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count); > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling) > } > > setup_payload_interference(g, payload_node_count, first_payload_node); > - if (brw->gen >= 7) > + if (brw->gen >= 7) { > setup_mrf_hack_interference(g, first_mrf_hack_node); > > + foreach_in_list(fs_inst, inst, &instructions) { > + /* When we do send-from-GRF for FB writes, we need to ensure that > + * the last write instruction sends from a high register. This is > + * because the vertex fetcher wants to start filling the low > + * payload registers while the pixel data port is still working on > + * writing out the memory. If we don't do this, we get rendering > + * artifacts. > + * > + * We could just do "something high". Instead, we just pick the > + * highest register that works. > + */ > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) { > + int size = virtual_grf_sizes[inst->src[0].reg]; > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] - 1; > + ra_set_node_reg(g, inst->src[0].reg, reg); > + break; > + } > + } > + } > + > if (dispatch_width > 8) { > /* In 16-wide dispatch we have an issue where a compressed > * instruction is actually two instructions executed simultaneiously. > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h > b/src/mesa/drivers/dri/i965/intel_screen.h > index 945f6f5..8bc6abd 100644 > --- a/src/mesa/drivers/dri/i965/intel_screen.h > +++ b/src/mesa/drivers/dri/i965/intel_screen.h > @@ -90,6 +90,11 @@ struct intel_screen > int classes[16]; > > /** > + * Array of the first ra reg in eahch ra class.
Typo. > + */ > + int class_first_reg[17]; > + > + /** > * Mapping for register-allocated objects in *regs to the first > * GRF for that object. > */ > -- > 2.1.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev I think it would be less confusing if instead of introducing a "class_first_reg" array you added a class_last_reg array and then did class_last_reg[size - 1] - iiuc, you're relying on the fact that the first reg of one class minus one is the last reg of the previous class, which isn't obvious and it takes a while to realize what's going on. BTW, just so you know, I don't think it's generally a good idea to set more than one node to be pre-allocated to the same register, since things will blow up if those nodes interfere. It's probably OK here since I can't think of a way that an input register preallocated to the highest register would interfere with the input to the fb write, but in the future we'll probably want to think about how to handle this in a more sophisticated way. This is sort of on a tangent, but with SSA register allocation we should probably allocate load_payload destinations before anything else and then treat load_payload as a series of writes to preallocated registers, since spilling the result of a load_payload seems like a silly thing to do and then we don't have to deal with allocating a contiguous set of registers which breaks some nice properties of SSA register allocation. Then, we could have a simple heuristic to allocate the source of an fb write as high as possible, or just allocate all load_payload instructions as high as possible. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev