On Sat, Sep 20, 2014 at 5:58 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > > On Sep 20, 2014 2:52 PM, "Connor Abbott" <cwabbo...@gmail.com> wrote: >> >> On Sat, Sep 20, 2014 at 4:56 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >> > >> > >> > On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbo...@gmail.com> >> > wrote: >> >> >> >> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <ja...@jlekstrand.net> >> >> wrote: >> >> > --- >> >> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24 >> >> > ++++++++++++++++++++++- >> >> > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++ >> >> > 2 files changed, 28 insertions(+), 1 deletion(-) >> >> > >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> > index b42f1e5..41e8855 100644 >> >> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> >> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, >> >> > int >> >> > reg_width) >> >> > /* Compute the total number of registers across all classes. */ >> >> > int ra_reg_count = 0; >> >> > for (int i = 0; i < class_count; i++) { >> >> > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count; >> >> > ra_reg_count += base_reg_count - (class_sizes[i] - 1); >> >> > } >> >> > + screen->wm_reg_sets[index].class_first_reg[class_count] = >> >> > ra_reg_count; >> >> > >> >> > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, >> >> > ra_reg_count); >> >> > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count); >> >> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling) >> >> > } >> >> > >> >> > setup_payload_interference(g, payload_node_count, >> >> > first_payload_node); >> >> > - if (brw->gen >= 7) >> >> > + if (brw->gen >= 7) { >> >> > setup_mrf_hack_interference(g, first_mrf_hack_node); >> >> > >> >> > + foreach_in_list(fs_inst, inst, &instructions) { >> >> > + /* When we do send-from-GRF for FB writes, we need to >> >> > ensure >> >> > that >> >> > + * the last write instruction sends from a high register. >> >> > This is >> >> > + * because the vertex fetcher wants to start filling the >> >> > low >> >> > + * payload registers while the pixel data port is still >> >> > working on >> >> > + * writing out the memory. If we don't do this, we get >> >> > rendering >> >> > + * artifacts. >> >> > + * >> >> > + * We could just do "something high". Instead, we just >> >> > pick >> >> > the >> >> > + * highest register that works. >> >> > + */ >> >> > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) { >> >> > + int size = virtual_grf_sizes[inst->src[0].reg]; >> >> > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] >> >> > - >> >> > 1; >> >> > + ra_set_node_reg(g, inst->src[0].reg, reg); >> >> > + break; >> >> > + } >> >> > + } >> >> > + } >> >> > + >> >> > if (dispatch_width > 8) { >> >> > /* In 16-wide dispatch we have an issue where a compressed >> >> > * instruction is actually two instructions executed >> >> > simultaneiously. >> >> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h >> >> > b/src/mesa/drivers/dri/i965/intel_screen.h >> >> > index 945f6f5..8bc6abd 100644 >> >> > --- a/src/mesa/drivers/dri/i965/intel_screen.h >> >> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h >> >> > @@ -90,6 +90,11 @@ struct intel_screen >> >> > int classes[16]; >> >> > >> >> > /** >> >> > + * Array of the first ra reg in eahch ra class. >> >> >> >> Typo. >> > >> > >> > Thanks >> > >> >> >> >> >> >> > + */ >> >> > + int class_first_reg[17]; >> >> > + >> >> > + /** >> >> > * Mapping for register-allocated objects in *regs to the >> >> > first >> >> > * GRF for that object. >> >> > */ >> >> > -- >> >> > 2.1.0 >> >> > >> >> > _______________________________________________ >> >> > mesa-dev mailing list >> >> > mesa-dev@lists.freedesktop.org >> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> >> >> I think it would be less confusing if instead of introducing a >> >> "class_first_reg" array you added a class_last_reg array and then did >> >> class_last_reg[size - 1] - iiuc, you're relying on the fact that the >> >> first reg of one class minus one is the last reg of the previous >> >> class, which isn't obvious and it takes a while to realize what's >> >> going on. >> > >> > >> > That's fair. I can change that easily enough. It doesn't really matter >> > that much. I just kind of liked the fact that last_reg[n] - >> > last_reg[n-1] >> > == class_size. But it doesn't matter and what you suggested would be >> > far >> > more obvious. >> > >> >> >> >> BTW, just so you know, I don't think it's generally a good idea to set >> >> more than one node to be pre-allocated to the same register, since >> >> things will blow up if those nodes interfere. It's probably OK here >> >> since I can't think of a way that an input register preallocated to >> >> the highest register would interfere with the input to the fb write, >> > >> > >> > Yes, I'm aware of the problem. However, the only things we preallocate >> > are >> > payload registers which only ever go at the beginning and the final FB >> > write >> > which goes at the end. As long as we don't start preallocating more >> > things, >> > I think we'll be fine. Also, isn't there some cap on how many payload >> > registers we can have? If so then that already solves our problem. >> >> Yeah, that's what I meant. I'm not sure if the varying payload >> registers can fill up the entire register space (and hence one gets >> allocated to the same register as the fb write load_payload >> destination), but I guess it doesn't matter too much. >> >> > >> >> >> >> but in the future we'll probably want to think about how to handle >> >> this in a more sophisticated way. This is sort of on a tangent, but >> >> with SSA register allocation we should probably allocate load_payload >> >> destinations before anything else and then treat load_payload as a >> >> series of writes to preallocated registers, since spilling the result >> >> of a load_payload seems like a silly thing to do and then we don't >> >> have to deal with allocating a contiguous set of registers which >> > >> > >> > If only that were true... The problem is that we will *always* have the >> > problem of contiguous registers. Once we add fp64 to the mix, we will >> > have >> > registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned >> > single >> > words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16. Sure, we can >> > probably preallocate payload registers, but we'll never get away from >> > handling multiple sizes. >> >> True, there's still that problem. And don't forget that we want to be >> able to allocate scalar/uniform registers, so now you have registers >> of size 1, 8, and 16... > > Yeah. I think right now we always do those with pulls so its not a problem > yet. We may want to do that in the future though.
Well, I meant more realizing when computations are done uniformly across all the EU's and then storing the result in only 1 register instead of 8 or 16 to save register pressure. I'm not sure if that's what you were thinking of. > >> I think Periera's puzzle-solving allocator >> (http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf) >> handles the case where the register classes are aligned, so if we make >> SIMD16 registers aligned like we currently do it might work. It's >> definitely a difficult problem though, and it'll take a lot of work to >> solve. > > Thanks for the link. I'll give it a look. > --Jason >> >> > >> >> >> >> breaks some nice properties of SSA register allocation. Then, we could >> >> have a simple heuristic to allocate the source of an fb write as high >> >> as possible, or just allocate all load_payload instructions as high as >> >> possible. >> > >> > >> > I've actually been thinking about the register allocation problem quite >> > a >> > bit lately. One of the things I've got on my todo list is to do some >> > research and figure out if we can be doing it better. There may be a >> > better >> > way to handle the multiple sizes problem than we're doing currently. >> > Unfortunately, I can't really say much now because I haven't dug into it >> > very far yet. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev