On 21 October 2013 18:34, Eric Anholt <e...@anholt.net> wrote: > Paul Berry <stereotype...@gmail.com> writes: > > > On 18 October 2013 17:04, Eric Anholt <e...@anholt.net> wrote: > >> Putting these fixups for a couple of weird cases in just MOV and ADD > >> feels wrong to me, but maybe when I understand better what's going on > >> it'll seem more natural. > >> > > > > Another possibility I'd be equally happy with would be to put the fixup > at > > the top of vec4_generator::generate_vec4_instruction(), before the switch > > statement. It would look something like this: > > > > if (dst.width == BRW_WIDTH_4) { > > /* This happens in attribute fixups for "dual instanced" geometry > > * shaders, since they use attributes that are vec4's. Since the > exec > > * width is only 4, it's essential that the caller set > > * force_writemask_all in order to make sure the instruction is > > executed > > * regardless of which channels are enabled. > > */ > > assert(inst->force_writemask_all); > > > > /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to > > satisfy > > * the following register region restrictions (from Graphics BSpec: > > * 3D-Media-GPGPU Engine > EU Overview > Registers and Register > > Regions > > * > Register Region Restrictions) > > * > > * 1. ExecSize must be greater than or equal to Width. > > * > > * 2. If ExecSize = Width and HorzStride != 0, VertStride must > be > > set > > * to Width * HorzStride." > > */ > > for (int i = 0; i < 3; i++) { > > if (src[i].file == BRW_GENERAL_REGISTER_FILE) > > src[i] = stride(src[i], 4, 4, 1); > > } > > } > > > > Does that seem better to you? I actually think I like it slightly better > > because by making the assertion more general, I caught another case > where I > > think I should be setting force_writemask_all to be on the safe side (the > > "clear r0.2" instruction in the gs prolog). > > I like this better -- it makes more sense to me for the fixup to be > non-opcode-specific. Any I think I get the problem now -- our registers > would make a ton of sense as <4;4,1> in general (that's how I think of > our GRFs in align16, at least!), except that we can't because then we'd > guess an execsize of 4.
Agreed. And it's not just how you think of our GRF's in align16. <4;4,1> is what we actually emit to the hardware, once the fixups at the bottom of brw_set_src{0,1}() take effect. > I'm in favor of the kill-guess-execsize plan, > even if we leave it in place for gen4/5 SF/CLIP threads (which bounce > execsize all over the place iirc and would be a pain to convert) and > only convert the new backends. > > Another option: How about instead of that assert in brw_eu_emit.c, we > just smash the vstride to be width * hstride? We know the vstride > doesn't matter, because you're only using execsize components, so let's > just not bother our brw_eu.c callers with this little problem. > I'm not really a fan of this--it feels like it's perpetuating the "<8;8,1> instead of <4;4,1>" lie by introducing even more code in brw_eu_emit.c to fix things up when we use a bogus <8;8,1> region. Also, if we do this, there's no convenient place to put the assert(force_writemask_all), which I think is really valuable. I would rather stick with the fixup at the top of generate_vec4_instruction() for now, and revisit the decision when we get rid of guess_execution_size().
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev