On Sat, Sep 20, 2014 at 4:56 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > > > On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >> >> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24 >> > ++++++++++++++++++++++- >> > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++ >> > 2 files changed, 28 insertions(+), 1 deletion(-) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> > index b42f1e5..41e8855 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int >> > reg_width) >> > /* Compute the total number of registers across all classes. */ >> > int ra_reg_count = 0; >> > for (int i = 0; i < class_count; i++) { >> > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count; >> > ra_reg_count += base_reg_count - (class_sizes[i] - 1); >> > } >> > + screen->wm_reg_sets[index].class_first_reg[class_count] = >> > ra_reg_count; >> > >> > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, >> > ra_reg_count); >> > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count); >> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling) >> > } >> > >> > setup_payload_interference(g, payload_node_count, >> > first_payload_node); >> > - if (brw->gen >= 7) >> > + if (brw->gen >= 7) { >> > setup_mrf_hack_interference(g, first_mrf_hack_node); >> > >> > + foreach_in_list(fs_inst, inst, &instructions) { >> > + /* When we do send-from-GRF for FB writes, we need to ensure >> > that >> > + * the last write instruction sends from a high register. >> > This is >> > + * because the vertex fetcher wants to start filling the low >> > + * payload registers while the pixel data port is still >> > working on >> > + * writing out the memory. If we don't do this, we get >> > rendering >> > + * artifacts. >> > + * >> > + * We could just do "something high". Instead, we just pick >> > the >> > + * highest register that works. >> > + */ >> > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) { >> > + int size = virtual_grf_sizes[inst->src[0].reg]; >> > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] - >> > 1; >> > + ra_set_node_reg(g, inst->src[0].reg, reg); >> > + break; >> > + } >> > + } >> > + } >> > + >> > if (dispatch_width > 8) { >> > /* In 16-wide dispatch we have an issue where a compressed >> > * instruction is actually two instructions executed >> > simultaneiously. >> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h >> > b/src/mesa/drivers/dri/i965/intel_screen.h >> > index 945f6f5..8bc6abd 100644 >> > --- a/src/mesa/drivers/dri/i965/intel_screen.h >> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h >> > @@ -90,6 +90,11 @@ struct intel_screen >> > int classes[16]; >> > >> > /** >> > + * Array of the first ra reg in eahch ra class. >> >> Typo. > > > Thanks > >> >> >> > + */ >> > + int class_first_reg[17]; >> > + >> > + /** >> > * Mapping for register-allocated objects in *regs to the first >> > * GRF for that object. >> > */ >> > -- >> > 2.1.0 >> > >> > _______________________________________________ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> I think it would be less confusing if instead of introducing a >> "class_first_reg" array you added a class_last_reg array and then did >> class_last_reg[size - 1] - iiuc, you're relying on the fact that the >> first reg of one class minus one is the last reg of the previous >> class, which isn't obvious and it takes a while to realize what's >> going on. > > > That's fair. I can change that easily enough. It doesn't really matter > that much. I just kind of liked the fact that last_reg[n] - last_reg[n-1] > == class_size. But it doesn't matter and what you suggested would be far > more obvious. > >> >> BTW, just so you know, I don't think it's generally a good idea to set >> more than one node to be pre-allocated to the same register, since >> things will blow up if those nodes interfere. It's probably OK here >> since I can't think of a way that an input register preallocated to >> the highest register would interfere with the input to the fb write, > > > Yes, I'm aware of the problem. However, the only things we preallocate are > payload registers which only ever go at the beginning and the final FB write > which goes at the end. As long as we don't start preallocating more things, > I think we'll be fine. Also, isn't there some cap on how many payload > registers we can have? If so then that already solves our problem.
Yeah, that's what I meant. I'm not sure if the varying payload registers can fill up the entire register space (and hence one gets allocated to the same register as the fb write load_payload destination), but I guess it doesn't matter too much. > >> >> but in the future we'll probably want to think about how to handle >> this in a more sophisticated way. This is sort of on a tangent, but >> with SSA register allocation we should probably allocate load_payload >> destinations before anything else and then treat load_payload as a >> series of writes to preallocated registers, since spilling the result >> of a load_payload seems like a silly thing to do and then we don't >> have to deal with allocating a contiguous set of registers which > > > If only that were true... The problem is that we will *always* have the > problem of contiguous registers. Once we add fp64 to the mix, we will have > registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned single > words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16. Sure, we can > probably preallocate payload registers, but we'll never get away from > handling multiple sizes. True, there's still that problem. And don't forget that we want to be able to allocate scalar/uniform registers, so now you have registers of size 1, 8, and 16... I think Periera's puzzle-solving allocator (http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf) handles the case where the register classes are aligned, so if we make SIMD16 registers aligned like we currently do it might work. It's definitely a difficult problem though, and it'll take a lot of work to solve. > >> >> breaks some nice properties of SSA register allocation. Then, we could >> have a simple heuristic to allocate the source of an fb write as high >> as possible, or just allocate all load_payload instructions as high as >> possible. > > > I've actually been thinking about the register allocation problem quite a > bit lately. One of the things I've got on my todo list is to do some > research and figure out if we can be doing it better. There may be a better > way to handle the multiple sizes problem than we're doing currently. > Unfortunately, I can't really say much now because I haven't dug into it > very far yet. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev