On 01/23/2013 07:18 PM, Eric Anholt wrote: > Chad Versace <chad.vers...@linux.intel.com> writes: >> +void >> +vec4_visitor::emit_unpack_half_2x16(dst_reg dst, src_reg src0) >> +{ >> + if (intel->gen < 7) >> + assert(!"ir_unop_unpack_half_2x16 should be lowered"); >> + >> + assert(dst.type == BRW_REGISTER_TYPE_F); >> + assert(src0.type == BRW_REGISTER_TYPE_UD); >> + >> + /* From the Ivybridge PRM, Vol4, Part3, Section 6.26 f32to16: >> + * >> + * Because this instruction does not have a 16-bit floating-point type, >> + * the source data type must be Word (W). The destination type must be >> + * F (Float). >> + * >> + * To use W as the source data type, we must adjust horizontal strides, >> + * which is only possible in align1 mode. All my [chadv] attempts at >> + * emitting align1 instructions for unpackHalf2x16 failed to pass the >> + * Piglit tests, so I gave up. >> + * >> + * I've verified that, on gen7, it is safe to emit f16to32 in align16 >> mode >> + * with UD as source data type. >> + */ > > Have you tested this on something like: > > in uvec4 v; > vec2 result = unpackHalf2x16(v.w); > > Those kinds of "the type must be X and the stride must by Y" have > sometimes meant that it's just hardcoded and they don't look at what you > program, so I'm concerned that some of your regioning > (swizzle/abs/neg/uniformness) will just get thrown out by the hardware. > > But if it's passing on your tests with uniforms, it's probably OK.
In the brw code generated by my vs-packHafl2x16 test on IVB, the source to f32to16 is swizzled as yz. If I recall correctly, for my vs-unpackHalf2x16 test, the source to f16to32 was also swizzled to the non-x channel. So I think it's safe to say that this does the right thing. >> + dst_reg tmp_dst(this, glsl_type::uvec2_type); >> + src_reg tmp_src(tmp_dst); >> + >> + /* tmp.x = src0 & 0xffffu; */ >> + tmp_dst.writemask = WRITEMASK_X; >> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_AND, >> + tmp_dst, src0, src_reg(0xffffu))); > > These ought to use the helper functions for simplicity: > "emit(AND(tmp_dst, src0, src_reg(0xffffu)));" Check out the ALU1 macro > for how to set up one of those to have a similar helper for F16TO32 if > you want to match up the style. Will do. FWIW, I'll also append the "I've experimentally the hardware does what I want to it do" comments by stating that the simulator does it too without complaint. >> + >> + /* tmp.y = src0 >> 16u; */ >> + tmp_dst.writemask = WRITEMASK_Y; >> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_SHR, >> + tmp_dst, src0, src_reg(16u))); >> + >> + /* dst.xy = f16to32(tmp); */ >> + dst.writemask = WRITEMASK_XY; >> + emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F16TO32, >> + dst, tmp_src)); >> +} _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev