Chad Versace <chad.vers...@linux.intel.com> writes: > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > index ebf8990..b5f1aae 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > @@ -348,6 +348,143 @@ vec4_visitor::emit_math(enum opcode opcode, > } > > void > +vec4_visitor::emit_pack_half_2x16(dst_reg dst, src_reg src0) > +{ > + if (intel->gen < 7) > + assert(!"ir_unop_pack_half_2x16 should be lowered"); > + > + /* uint dst; */ > + assert(dst.type == BRW_REGISTER_TYPE_UD); > + > + /* vec2 src0; */ > + assert(src0.type == BRW_REGISTER_TYPE_F); > + > + /* uvec2 tmp; > + * > + * The PRM lists the destination type of f32to16 as W. However, I've > + * experimentally confirmed on gen7 that it must be a 32-bit size, such as > + * UD, in align16 mode. > + */ > + dst_reg tmp_dst(this, glsl_type::uvec2_type); > + src_reg tmp_src(tmp_dst); > + > + /* tmp.xy = f32to16(src0); */ > + tmp_dst.writemask = WRITEMASK_XY; > + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F32TO16, > + tmp_dst, src0)); > + > + /* The result's high 16 bits are in the low 16 bits of the temporary > + * register's Y channel. The result's low 16 bits are in the low 16 bits > + * of the X channel. > + * > + * In experiments on gen7 I've found the that, in the temporary register, > + * the hight 16 bits of the X and Y channels are zeros. This is critical
"high" > + * for the SHL and OR instructions below to work as expected. > + */ The docs say that the high bits are unchanged. The temporary reg will often have already had 0 in it to begin with, but sometimes not. Have you confirmed that the high bits of the x channel were changed to 0 if you had initialized them to non-zero? > + /* Idea for reducing the above number of registers and instructions > + * ---------------------------------------------------------------- > + * > + * It should be possible to remove the temporary register and replace the > + * SHL and OR instructions above with a single MOV instruction mode in > + * align1 mode that uses clever register region addressing. (It is > + * impossible to specify the necessary register regions in align16 mode). > + * Unfortunately, it is difficult to emit an align1 instruction here. > + * > + * In particular, I want to do this: > + * > + * # Give dst the form: > + * # > + * # w z y x w z y x > + * # |0|0|0x0000hhhh|0x0000llll|0|0|0x0000hhhh|0x0000llll| > + * # > + * f32to16(8) dst<1>.xy:UD src<4;4,1>:F {align16} > + * > + * # Transform dst into the form of packHalf2x16's output. > + * # > + * # w z y x w z y x > + * # |0|0|0x00000000|0xhhhhllll|0|0|0x00000000|0xhhhhllll| > + * # > + * # Use width=2 in order to move the Y channel's high 16 bits > + * # into the low 16 bits, thus clearing the Y channel to zero. > + * # > + * mov(4) dst.1<1>:UW dst.2<8;2,1>:UW {align1} > + */ I like the sound of this, and it would be a matter of making a new VS_OPCODE that the generator implements. > +}
pgp9zCUXIpXC5.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev