Chad Versace <chad.vers...@linux.intel.com> writes: > diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > index 324e665..9b54796 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp > @@ -923,6 +923,96 @@ fs_generator::generate_set_global_offset(fs_inst *inst, > } > > void > +fs_generator::generate_pack_half_2x16_split(fs_inst *inst, > + struct brw_reg dst, > + struct brw_reg x, > + struct brw_reg y) > +{ > + assert(intel->gen >= 7); > + assert(dst.type == BRW_REGISTER_TYPE_UD); > + assert(x.type = BRW_REGISTER_TYPE_F); > + assert(y.type = BRW_REGISTER_TYPE_F); > + > + /* From the Ivybridge PRM, Vol4, Part3, Section 6.27 f32to16: > + * > + * Because this instruction does not have a 16-bit floating-point type, > + * the destination data type must be Word (W). > + * > + * The destination must be DWord-aligned and specify a horizontal stride > + * (HorzStride) of 2. The 16-bit result is stored in the lower word of > + * each destination channel and the upper word is not modified. > + */ > + > + /* Give each 32-bit channel of dst the form below , where "." means > + * unchanged. > + * 0x....hhhh > + * > + * Per the PRM, change the source data type to W. To compensate for > + * halving the data type width, double the horizontal stride. (The > + * BRW_*_STRIDE enums are defined so that incrementing the field doubles > + * the real stride). > + */ > + dst.type = BRW_REGISTER_TYPE_W; > + if (dst.hstride != 0) > + ++dst.hstride; > + brw_F32TO16(p, dst, y); > + > + /* Now the form: > + * 0xhhhh0000 > + */ > + dst.type = BRW_REGISTER_TYPE_UD; > + if (dst.hstride != 0) > + --dst.hstride;
Perhaps a temporary named "dst_uw" that's uw-typed instead of popping in and out of being word-typed? And a local static function to make one From a uint fs_reg, so you don't have to explain the hstride and vstride increment twice (and vstride increment is apparently dropped from this function but still present in the next one, which is odd). Functionally, the code looks good now. The missing vstride happens to be safe because, because either it's 0, or width == execsize so it's unused. > + brw_SHL(p, dst, dst, brw_imm_ud(16u)); > + > + /* And, finally the form of packHalf2x16's output: > + * 0xhhhhllll > + */ > + dst.type = BRW_REGISTER_TYPE_W; > + if (dst.hstride != 0) > + ++dst.hstride; > + brw_F32TO16(p, dst, x); > +}
pgpJjlkSOHTfd.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev