On Tue, 2016-09-13 at 22:24 -0700, Francisco Jerez wrote: > Iago Toral <ito...@igalia.com> writes: > > > > > On Mon, 2016-09-12 at 14:19 -0700, Francisco Jerez wrote: > > > > > > Iago Toral Quiroga <ito...@igalia.com> writes: > > > > > > > > > > > > > > > SIMD4x2 64bit data is stored in register space like this: > > > > > > > > r0.0:DF x0 y0 z0 w0 > > > > r0.1:DF x1 y1 z1 w1 > > > > > > > > When we need to write data such as this to memory using 32-bit > > > > write > > > > messages we need to shuffle it in this fashion: > > > > > > > > r0.0:DF x0 y0 x1 y1 > > > > r0.1:DF z0 w0 z1 w1 > > > > > > > > and emit two 32-bit write messages, one for r0.0 at base_offset > > > > and another one for r0.1 at base_offset+16. > > > > > > > > We also need to do the inverse operation when we read using 32- > > > > bit > > > > messages > > > > to produce valid SIMD4x2 64bit data from the data read. We can > > > > achieve this > > > > by aplying the exact same shuffling to the data read, although > > > > we > > > > need to > > > > apply different channel enables since the layout of the data is > > > > reversed. > > > > > > > > This helper implements the data shuffling logic and we will use > > > > it > > > > in > > > > various places where we read and write 64bit data from/to > > > > memory. > > > > --- > > > > src/mesa/drivers/dri/i965/brw_vec4.h | 5 ++ > > > > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 96 > > > > ++++++++++++++++++++++++++++++ > > > > 2 files changed, 101 insertions(+) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h > > > > b/src/mesa/drivers/dri/i965/brw_vec4.h > > > > index 26228d0..3337fc0 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_vec4.h > > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h > > > > @@ -327,6 +327,11 @@ public: > > > > > > > > src_reg setup_imm_df(double v); > > > > > > > > + vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg > > > > src, > > > > + bool for_write, > > > > + bblock_t *block = > > > > NULL, > > > > + vec4_instruction *ref > > > > = > > > > NULL); > > > > + > > > > virtual void emit_nir_code(); > > > > virtual void nir_setup_uniforms(); > > > > virtual void > > > > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr); > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > index 450db92..346e822 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > @@ -2145,4 +2145,100 @@ > > > > vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr) > > > > dst_reg(VGRF, alloc.allocate(instr->def.bit_size / 32)); > > > > } > > > > > > > > +/* SIMD4x2 64bit data is stored in register space like this: > > > > + * > > > > + * r0.0:DF x0 y0 z0 w0 > > > > + * r0.1:DF x1 y1 z1 w1 > > > > + * > > > > + * When we need to write data such as this to memory using 32- > > > > bit > > > > write > > > > + * messages we need to shuffle it in this fashion: > > > > + * > > > > + * r0.0:DF x0 y0 x1 y1 (to be written at base offset) > > > > + * r0.0:DF z0 w0 z1 w1 (to be written at base offset + 16) > > > > + * > > > > + * We need to do the inverse operation when we read using 32- > > > > bit > > > > messages, > > > > + * which we can do by applying the same exact shuffling on the > > > > 64- > > > > bit data > > > > + * read, only that because the data for each vertex is > > > > positioned > > > > differently > > > > + * we need to apply different channel enables. > > > > + * > > > > + * This function takes 64bit data and shuffles it as explained > > > > above. > > > > + * > > > > + * The @for_write parameter is used to specify if the > > > > shuffling is > > > > being done > > > > + * for proper SIMD4x2 64-bit data that needs to be shuffled > > > > prior > > > > to a 32-bit > > > > + * write message (for_write = true), or instead we are doing > > > > the > > > > inverse > > > > + * opperation and we have just read 64-bit data using a 32-bit > > > > messages that we > > > > + * need to shuffle to create valid SIMD4x2 64-bit data > > > > (for_write > > > > = false). > > > > + * > > > > + * If @block and @ref are non-NULL, then the shuffling is done > > > > after @ref, > > > > + * otherwise the instructions are emitted normally at the end. > > > > The > > > > function > > > > + * returns the last instruction inserted. > > > > + * > > > > + * Notice that @src and @dst cannot be the same register. > > > > + */ > > > > +vec4_instruction * > > > > +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, > > > > bool > > > > for_write, > > > > + bblock_t *block, > > > > vec4_instruction > > > > *ref) > > > > +{ > > > > + assert(type_sz(src.type) == 8); > > > > + assert(type_sz(dst.type) == 8); > > > > + assert(!src.in_range(dst, 2)); > > > > + assert(dst.writemask == WRITEMASK_XYZW); > > > > + assert(!ref == !block); > > > > + > > > > + vec4_instruction *inst, *last; > > > > + bool emit_before = ref != NULL; > > > > + > > > > + #define EMIT(i) \ > > > > + if (!emit_before) { \ > > > > + emit(i); \ > > > > + } else { \ > > > > + ref->insert_after(block, i); \ > > > > + ref = i; \ > > > > + } \ > > > > + last = i; > > > > + > > > > + /* Resolve swizzle in src */ > > > > + if (src.swizzle != BRW_SWIZZLE_XYZW) { > > > > + dst_reg data = dst_reg(this, glsl_type::dvec4_type); > > > > + inst = MOV(data, src); > > > > + EMIT(inst); > > > > + src = src_reg(data); > > > > + } > > > > + > > > > + /* dst+0.XY = src+0.XY */ > > > > + dst.writemask = WRITEMASK_XY; > > > > + inst = MOV(dst, src); > > > > + inst->regs_written = 1; > > > > + inst->exec_size = 4; > > > > + EMIT(inst); > > > > + > > > > + /* dst+0.ZW = src+1.XY */ > > > > + dst.writemask = WRITEMASK_ZW; > > > > + inst = MOV(dst, swizzle(offset(src, 1), BRW_SWIZZLE_XYXY)); > > > > + inst->regs_written = 1; > > > > + inst->exec_size = 4; > > > > + inst->group = for_write ? 4 : 0; > > > > + EMIT(inst); > > > > + > > > > + /* dst+1.XY = src+0.ZW */ > > > > + dst.writemask = WRITEMASK_XY; > > > > + inst = MOV(offset(dst, 1), swizzle(src, BRW_SWIZZLE_ZWZW)); > > > > + inst->regs_written = 1; > > > > + inst->exec_size = 4; > > > > + inst->group = for_write ? 0 : 4; > > > > + EMIT(inst); > > > > + > > > > + /* dst+1.ZW = src+1.ZW */ > > > > + dst.writemask = WRITEMASK_ZW; > > > > + inst = MOV(offset(dst, 1), offset(src, 1)); > > > > + inst->regs_written = 1; > > > > + inst->exec_size = 4; > > > > + inst->group = 4; > > > > + EMIT(inst); > > > > + > > > > + #undef EMIT > > > > + > > > > + return last; > > > > +} > > > > + > > > This would be a *lot* simpler if you used and took as argument a > > > VEC4 > > > builder instead of constructing the IR manually (and if you used > > > the > > > writemask() helper). It might need some small fixes for the VEC4 > > > builder to initialize the new group and exec_size fields > > > correctly > > > but > > > it seems worth doing just because of this function alone. > > Sure, I can do that. About using the writemask() helper, the > > problem > > with it is that it doesn't set the input writemask on the dst, > > instead > > it sets the result of "dst.writemask & mask", and when we are > > shuffling > > we really want to force the writemask independently of any previous > > writemask the dst register had. > Why? If I give a register r with writemask XY to a function 'foo', I > wouldn't expect it to ignore the writemask I told it to use and > instead > use a different writemask not contained in the original > writemask. If > you're doing that right now you likely have a bug.
Shuffled data is always written to a new vgrf because (so there is never data to preserve in the dst) and it is only used specifically right before a write where we honor the writemask on the dst of the write operation or right after a read (where we the swizzle on the source takes care of reading the data channels it needs from the shuffled result), so I think this is probably fine. > > > > Probably the dst we get here has an XYZW writemask anyway since it > > would typically be a temporary vgrf we have just created to store > > the > > result of the shuffling operation. Maybe we should just add an > > assert > > to ensure that, and then we should be free to use the writemask() > > helper. > > > Looks like you have one already: > > + assert(dst.writemask == WRITEMASK_XYZW); > > Question is do you need it only because you're bashing the writemasks > directly instead of using writemask(), or is there any other reason > for > the assert? I don't think we really need this as a requirement and we can honor any writemask we get. That said, I don't think that would be immediately useful in any scenario, at least right now. As a side note, notice that using writemasks with shuffling operations requires to think a bit about what we are really doing in each case: if we use an XY writemask when we shuffle before a write, we are shuffling XYZW components of just one (the first) of the SIMD4x2 threads. If we do the same thing when we shuffle the result of a read then we are writing components XY of both threads. Either way, if some code path really wants to shuffle with a writemask set and it knows what it is doing I guess there is no reason to stand in the way. > > > > > > > > > > > > > > > > > } _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev