On 18 October 2013 17:04, Eric Anholt <e...@anholt.net> wrote: > Paul Berry <stereotype...@gmail.com> writes: > > > Geometry shaders that run in "DUAL_INSTANCED" mode store their inputs > > in vec4's. This means that when compiling gl_PointSize input > > swizzling (a MOV instruction which uses a geometry shader input as > > both source and destination), we need to do two things: > > > > - Set force_writemask_all to ensure that the MOV happens regardless of > > which channels are enabled. > > > > - Set the source register region to <4;4,1> (instead of <0;4,1> to > > satisfy register region restrictions. > > This sure sounds like something you empirically found, but I'm confused. > I would have assumed that DUAL_INSTANCED with an instance count of 1 > meant that the channel enables the hardware gave you had the first 4 > enabled and the second 4 disabled. And since the dst.width (and thus > execsize) is 4, whether or not the second 4 are disabled wouldn't > matter. In that case, why do you need the writemask forced, since just > the 4 channels you care about will be affected, anyway? >
Setting force_writemask_all was not empirical--I did it out of general caution. I couldn't find any documentation in the bspec to guarantee that an instance count of 1 would get dispatched to the first 4 channels rather than the second 4, and I didn't want to rely on undocumented behaviour. > > And, if dst.width == 4, then execsize == 4, and I'm confused what > register region restriction is being honored by promoting the hstride to > 4. > This part was empirical. I discovered that if I don't set the hstride to 4, then I get an assertion failure here (in validate_reg() in brw_eu_emit.c): /* 4. */ if (execsize == width && hstride != 0) { assert(vstride == -1 || vstride == width * hstride); } I don't know why the hardware cares about this, but I double-checked the bspec and this restriction is really there. Side note: I forgot to mention it in the comments, but in addition to fixing up <0;4,1> -> <4;4,1>, this code fixes up <8;8,1> -> <4;4,1>. That's necessary for a stupid reason: in the vec4 back-end we represent GRFs as <8;8,1> so that guess_execution_size() will correctly guess an execution size of 8. However, in align16 mode, the hardware assumes a width of 4, so it really needs to be <4;4,1>. Normally, we fix that up in brw_set_src0() and brw_set_src1(), but we do it *after* the call to validate_reg(). So to avoid validate_reg() asserting on these width-4 instructions, we need to change the source register from <8;8,1> to <4;4,1> before emitting the instruction. One of these days, I swear I'm going to get rid of guess_execution_size() so we can end this sort of madness. > > Putting these fixups for a couple of weird cases in just MOV and ADD > feels wrong to me, but maybe when I understand better what's going on > it'll seem more natural. > Another possibility I'd be equally happy with would be to put the fixup at the top of vec4_generator::generate_vec4_instruction(), before the switch statement. It would look something like this: if (dst.width == BRW_WIDTH_4) { /* This happens in attribute fixups for "dual instanced" geometry * shaders, since they use attributes that are vec4's. Since the exec * width is only 4, it's essential that the caller set * force_writemask_all in order to make sure the instruction is executed * regardless of which channels are enabled. */ assert(inst->force_writemask_all); /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to satisfy * the following register region restrictions (from Graphics BSpec: * 3D-Media-GPGPU Engine > EU Overview > Registers and Register Regions * > Register Region Restrictions) * * 1. ExecSize must be greater than or equal to Width. * * 2. If ExecSize = Width and HorzStride != 0, VertStride must be set * to Width * HorzStride." */ for (int i = 0; i < 3; i++) { if (src[i].file == BRW_GENERAL_REGISTER_FILE) src[i] = stride(src[i], 4, 4, 1); } } Does that seem better to you? I actually think I like it slightly better because by making the assertion more general, I caught another case where I think I should be setting force_writemask_all to be on the safe side (the "clear r0.2" instruction in the gs prolog).
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev