Iago Toral <ito...@igalia.com> writes: > On Mon, 2016-09-12 at 14:19 -0700, Francisco Jerez wrote: >> Iago Toral Quiroga <ito...@igalia.com> writes: >> >> > >> > SIMD4x2 64bit data is stored in register space like this: >> > >> > r0.0:DF x0 y0 z0 w0 >> > r0.1:DF x1 y1 z1 w1 >> > >> > When we need to write data such as this to memory using 32-bit >> > write >> > messages we need to shuffle it in this fashion: >> > >> > r0.0:DF x0 y0 x1 y1 >> > r0.1:DF z0 w0 z1 w1 >> > >> > and emit two 32-bit write messages, one for r0.0 at base_offset >> > and another one for r0.1 at base_offset+16. >> > >> > We also need to do the inverse operation when we read using 32-bit >> > messages >> > to produce valid SIMD4x2 64bit data from the data read. We can >> > achieve this >> > by aplying the exact same shuffling to the data read, although we >> > need to >> > apply different channel enables since the layout of the data is >> > reversed. >> > >> > This helper implements the data shuffling logic and we will use it >> > in >> > various places where we read and write 64bit data from/to memory. >> > --- >> > src/mesa/drivers/dri/i965/brw_vec4.h | 5 ++ >> > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 96 >> > ++++++++++++++++++++++++++++++ >> > 2 files changed, 101 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h >> > b/src/mesa/drivers/dri/i965/brw_vec4.h >> > index 26228d0..3337fc0 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.h >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h >> > @@ -327,6 +327,11 @@ public: >> > >> > src_reg setup_imm_df(double v); >> > >> > + vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg src, >> > + bool for_write, >> > + bblock_t *block = NULL, >> > + vec4_instruction *ref = >> > NULL); >> > + >> > virtual void emit_nir_code(); >> > virtual void nir_setup_uniforms(); >> > virtual void >> > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr); >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > index 450db92..346e822 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > @@ -2145,4 +2145,100 @@ >> > vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr) >> > dst_reg(VGRF, alloc.allocate(instr->def.bit_size / 32)); >> > } >> > >> > +/* SIMD4x2 64bit data is stored in register space like this: >> > + * >> > + * r0.0:DF x0 y0 z0 w0 >> > + * r0.1:DF x1 y1 z1 w1 >> > + * >> > + * When we need to write data such as this to memory using 32-bit >> > write >> > + * messages we need to shuffle it in this fashion: >> > + * >> > + * r0.0:DF x0 y0 x1 y1 (to be written at base offset) >> > + * r0.0:DF z0 w0 z1 w1 (to be written at base offset + 16) >> > + * >> > + * We need to do the inverse operation when we read using 32-bit >> > messages, >> > + * which we can do by applying the same exact shuffling on the 64- >> > bit data >> > + * read, only that because the data for each vertex is positioned >> > differently >> > + * we need to apply different channel enables. >> > + * >> > + * This function takes 64bit data and shuffles it as explained >> > above. >> > + * >> > + * The @for_write parameter is used to specify if the shuffling is >> > being done >> > + * for proper SIMD4x2 64-bit data that needs to be shuffled prior >> > to a 32-bit >> > + * write message (for_write = true), or instead we are doing the >> > inverse >> > + * opperation and we have just read 64-bit data using a 32-bit >> > messages that we >> > + * need to shuffle to create valid SIMD4x2 64-bit data (for_write >> > = false). >> > + * >> > + * If @block and @ref are non-NULL, then the shuffling is done >> > after @ref, >> > + * otherwise the instructions are emitted normally at the end. The >> > function >> > + * returns the last instruction inserted. >> > + * >> > + * Notice that @src and @dst cannot be the same register. >> > + */ >> > +vec4_instruction * >> > +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, bool >> > for_write, >> > + bblock_t *block, vec4_instruction >> > *ref) >> > +{ >> > + assert(type_sz(src.type) == 8); >> > + assert(type_sz(dst.type) == 8); >> > + assert(!src.in_range(dst, 2)); >> > + assert(dst.writemask == WRITEMASK_XYZW); >> > + assert(!ref == !block); >> > + >> > + vec4_instruction *inst, *last; >> > + bool emit_before = ref != NULL; >> > + >> > + #define EMIT(i) \ >> > + if (!emit_before) { \ >> > + emit(i); \ >> > + } else { \ >> > + ref->insert_after(block, i); \ >> > + ref = i; \ >> > + } \ >> > + last = i; >> > + >> > + /* Resolve swizzle in src */ >> > + if (src.swizzle != BRW_SWIZZLE_XYZW) { >> > + dst_reg data = dst_reg(this, glsl_type::dvec4_type); >> > + inst = MOV(data, src); >> > + EMIT(inst); >> > + src = src_reg(data); >> > + } >> > + >> > + /* dst+0.XY = src+0.XY */ >> > + dst.writemask = WRITEMASK_XY; >> > + inst = MOV(dst, src); >> > + inst->regs_written = 1; >> > + inst->exec_size = 4; >> > + EMIT(inst); >> > + >> > + /* dst+0.ZW = src+1.XY */ >> > + dst.writemask = WRITEMASK_ZW; >> > + inst = MOV(dst, swizzle(offset(src, 1), BRW_SWIZZLE_XYXY)); >> > + inst->regs_written = 1; >> > + inst->exec_size = 4; >> > + inst->group = for_write ? 4 : 0; >> > + EMIT(inst); >> > + >> > + /* dst+1.XY = src+0.ZW */ >> > + dst.writemask = WRITEMASK_XY; >> > + inst = MOV(offset(dst, 1), swizzle(src, BRW_SWIZZLE_ZWZW)); >> > + inst->regs_written = 1; >> > + inst->exec_size = 4; >> > + inst->group = for_write ? 0 : 4; >> > + EMIT(inst); >> > + >> > + /* dst+1.ZW = src+1.ZW */ >> > + dst.writemask = WRITEMASK_ZW; >> > + inst = MOV(offset(dst, 1), offset(src, 1)); >> > + inst->regs_written = 1; >> > + inst->exec_size = 4; >> > + inst->group = 4; >> > + EMIT(inst); >> > + >> > + #undef EMIT >> > + >> > + return last; >> > +} >> > + >> This would be a *lot* simpler if you used and took as argument a VEC4 >> builder instead of constructing the IR manually (and if you used the >> writemask() helper). It might need some small fixes for the VEC4 >> builder to initialize the new group and exec_size fields correctly >> but >> it seems worth doing just because of this function alone. > > Sure, I can do that. About using the writemask() helper, the problem > with it is that it doesn't set the input writemask on the dst, instead > it sets the result of "dst.writemask & mask", and when we are shuffling > we really want to force the writemask independently of any previous > writemask the dst register had.
Why? If I give a register r with writemask XY to a function 'foo', I wouldn't expect it to ignore the writemask I told it to use and instead use a different writemask not contained in the original writemask. If you're doing that right now you likely have a bug. > Probably the dst we get here has an XYZW writemask anyway since it > would typically be a temporary vgrf we have just created to store the > result of the shuffling operation. Maybe we should just add an assert > to ensure that, and then we should be free to use the writemask() > helper. > Looks like you have one already: + assert(dst.writemask == WRITEMASK_XYZW); Question is do you need it only because you're bashing the writemasks directly instead of using writemask(), or is there any other reason for the assert? >> > >> > }
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev