On 21 January 2013 00:49, Chad Versace <chad.vers...@linux.intel.com> wrote:
> Lower them to arithmetic and bit manipulation expressions. > > v2: > - Rewrite using ir_builder. [for idr] > - In lowering packHalf2x16, don't truncate subnormal float16 values to > zero. > And round to even rather than to zero. [for stereotype441] > > CC: Ian Romanick <i...@freedesktop.org> > CC: Paul Berry <stereotype...@gmail.com> > Signed-off-by: Chad Versace <chad.vers...@linux.intel.com> > --- > src/glsl/Makefile.sources | 1 + > src/glsl/ir_optimization.h | 20 + > src/glsl/lower_packing_builtins.cpp | 1043 > +++++++++++++++++++++++++++++++++++ > 3 files changed, 1064 insertions(+) > create mode 100644 src/glsl/lower_packing_builtins.cpp > > (snip) > + void > + setup_factory(void *mem_ctx) > + { > + assert(factory.mem_ctx == NULL); > + factory.mem_ctx = mem_ctx; > + > + /* Avoid making a new list for each call to handle_rvalue(). Make a > + * single list and reuse it. > + */ > + if (factory.instructions == NULL) { > + factory.instructions = new(NULL) exec_list(); > + } else { > + assert(factory.instructions->is_empty()); > + } > + } > Do we need factory.instructions to be heap-allocated? How about just making a private exec_list inside lower_packing_builtins_visitor and setting factory.instructions to point to it in the lower_packing_builtins_visitor constructor? (snip) > + /** > + * \brief Lower the component-wise calculation of packHalf2x16. > + * > + * \param f_rval is one component of packHafl2x16's input > + * \param e_rval is the unshifted exponent bits of f_rval > + * \param m_rval is the unshifted mantissa bits of f_rval > + * > + * \return a uint rvalue that encodes a float16 in its lower 16 bits > + */ > + ir_rvalue* > + pack_half_1x16_nosign(ir_rvalue *f_rval, > + ir_rvalue *e_rval, > + ir_rvalue *m_rval) > + { > + assert(e_rval->type == glsl_type::uint_type); > + assert(m_rval->type == glsl_type::uint_type); > + > + /* uint u16; */ > + ir_variable *u16 = factory.make_temp(glsl_type::uint_type, > + "tmp_pack_half_1x16_u16"); > + > + /* float f = FLOAT_RVAL; */ > + ir_variable *f = factory.make_temp(glsl_type::float_type, > + "tmp_pack_half_1x16_f"); > + factory.emit(assign(f, f_rval)); > + > + /* uint e = E_RVAL; */ > + ir_variable *e = factory.make_temp(glsl_type::uint_type, > + "tmp_pack_half_1x16_e"); > + factory.emit(assign(e, e_rval)); > + > + /* uint m = M_RVAL; */ > + ir_variable *m = factory.make_temp(glsl_type::uint_type, > + "tmp_pack_half_1x16_m"); > + factory.emit(assign(m, m_rval)); > + > + /* Preliminaries > + * ------------- > + * > + * For a float16, the bit layout is: > + * > + * sign: 15 > + * exponent: 10:14 > + * mantissa: 0:9 > + * > + * Let f16 be a float16 value. The sign, exponent, and mantissa > + * determine its value thus: > + * > + * if e16 = 0 and m16 = 0, then zero: (-1)^s16 * 0 > (1) > + * if e16 = 0 and m16!= 0, then subnormal: (-1)^s16 * 2^(e16 - > 14) * (m16 / 2^10) (2) > + * if 0 < e16 < 31, then normal: (-1)^s16 * 2^(e16 - > 15) * (1 + m16 / 2^10) (3) > + * if e16 = 31 and m16 = 0, then infinite: (-1)^s16 * inf > (4) > + * if e16 = 31 and m16 != 0, then NaN > (5) > + * > + * where 0 <= m16 < 2^10. > + * > + * For a float32, the bit layout is: > + * > + * sign: 31 > + * exponent: 23:30 > + * mantissa: 0:22 > + * > + * Let f32 be a float32 value. The sign, exponent, and mantissa > + * determine its value thus: > + * > + * if e32 = 0 and m32 = 0, then zero: (-1)^s * 0 > (10) > + * if e32 = 0 and m32 != 0, then subnormal: (-1)^s * 2^(e32 - > 126) * (m32 / 2^23) (11) > + * if 0 < e32 < 255, then normal: (-1)^s * 2^(e32 - > 127) * (1 + m32 / 2^23) (12) > + * if e32 = 255 and m32 = 0, then infinite: (-1)^s * inf > (13) > + * if e32 = 255 and m32 != 0, then NaN > (14) > + * > + * where 0 <= m32 < 2^23. > + * > + * The minimum and maximum normal float16 values are > + * > + * min_norm16 = 2^(1 - 15) * (1 + 0 / 2^10) = 2^(-14) (20) > + * max_norm16 = 2^(30 - 15) * (1 + 1023 / 2^10) (21) > + * > + * The step at max_norm16 is > + * > + * max_step16 = 2^5 (22) > + * > + * Observe that the float16 boundary values in equations 20-21 lie > in the > + * range of normal float32 values. > + * > + * > + * Rounding Behavior > + * ----------------- > + * Not all float32 values can be exactly represented as a float16. > We > + * round all such intermediate float32 values to the nearest > float16; if > + * the float32 is exactly between to float16 values, we round to > the one > + * with an even mantissa. This rounding behavior has several > benefits: > + * > + * - It has no sign bias. > + * > + * - It reproduces the behavior of real hardware: opcode F32TO16 > in Intel's > + * GPU ISA. > + * > + * - By reproducing the behavior of the GPU (at least on Intel > hardware), > + * compile-time evaluation of constant packHalf2x16 GLSL > expressions will > + * result in the same value as if the expression were executed > on the > + * GPU. > + * > + * Calculation > + * ----------- > + * Our task is to compute s16, e16, m16 given f32. Since this > function > + * ignores the sign bit, assume that s32 = s16 = 0. There are > several > + * cases consider. > + */ > + > + factory.emit( > + > + /* Case 1) f32 is NaN > + * > + * The resultant f16 will also be NaN. > + */ > + > + /* if (e32 == 255 && m32 != 0) { */ > + if_tree(logic_and(equal(e, constant(0xffu << 23u)), > + logic_not(equal(m, constant(0u)))), > + > + assign(u16, constant(0x7fffu)), > + > + /* Case 2) f32 lies in the range [0, min_norm16). > + * > + * The resultant float16 will be either zero, subnormal, or > normal. > + * > + * Solving > + * > + * f32 = min_norm16 (30) > + * > + * gives > + * > + * e32 = 113 and m32 = 0 (31) > + * > + * Therefore this case occurs if and only if > + * > + * e32 < 113 (32) > + */ > + > + /* } else if (e32 < 113) { */ > + if_tree(less(e, constant(113u << 23u)), > + > + /* u16 = uint(round_to_even(abs(f32) * float(1u << 24u))); */ > + assign(u16, f2u(round_even(mul(expr(ir_unop_abs, f), > + constant((float) (1 << > 24)))))), > + > + /* Case 3) f32 lies in the range > + * [min_norm16, max_norm16 + max_step16). > + * > + * The resultant float16 will be either normal or infinite. > + * > + * Solving > + * > + * f32 = max_norm16 + max_step16 (40) > + * = 2^15 * (1 + 1023 / 2^10) + 2^5 (41) > + * = 2^16 (42) > + * gives > + * > + * e32 = 142 and m32 = 0 (43) > I calculate this to be 143, not 142. > + * > + * We already solved the boundary condition f32 = min_norm16 > above > + * in equation 31. Therefore this case occurs if and only if > + * > + * 113 <= e32 and e32 < 142 > So this should be e32 < 143. > + */ > + > + /* } else if (e32 < 142) { */ > + if_tree(lequal(e, constant(142u << 23u)), > Fortunately, since you use "lequal" here, you get the correct effect. > + > + /* The addition below handles the case where the mantissa > rounds > + * up to 1024 and bumps the exponent. > + * > + * u16 = ((e - (112u << 23u)) >> 13u) > + * + round_to_even((float(m) / (1u << 13u)); > + */ > + assign(u16, add(rshift(sub(e, constant(112u << 23u)), > + constant(13u)), > + f2u(round_even( > + div(u2f(m), constant((float) (1 << > 13))))))), > + > + /* Case 4) f32 lies in the range [max_norm16 + max_step16, inf]. > + * > + * The resultant float16 will be infinite. > + * > + * The cases above caught all float32 values in the range > + * [0, max_norm16 + max_step16), so this is the fall-through > case. > + */ > + > + /* } else { */ > + > + assign(u16, constant(31u << 10u)))))); > + > + /* } */ > + > + return deref(u16).val; > + } > (snip) > + /** > + * \brief Lower the component-wise calculation of unpackHalf2x16. > + * > + * Given a uint that encodes a float16 in its lower 16 bits, this > function > + * returns a uint that encodes a float32 with the same value. The sign > bit > + * of the float16 is ignored. > + * > + * \param e_rval is the unshifted exponent bits of a float16 > + * \param m_rval is the unshifted mantissa bits of a float16 > + * \param a uint rvalue that encodes a float32 > + */ > + ir_rvalue* > + unpack_half_1x16_nosign(ir_rvalue *e_rval, ir_rvalue *m_rval) > + { > + assert(e_rval->type == glsl_type::uint_type); > + assert(m_rval->type == glsl_type::uint_type); > + > + /* uint u32; */ > + ir_variable *u32 = factory.make_temp(glsl_type::uint_type, > + "tmp_unpack_half_1x16_u32"); > + > + /* uint e = E_RVAL; */ > + ir_variable *e = factory.make_temp(glsl_type::uint_type, > + "tmp_unpack_half_1x16_e"); > + factory.emit(assign(e, e_rval)); > + > + /* uint m = M_RVAL; */ > + ir_variable *m = factory.make_temp(glsl_type::uint_type, > + "tmp_unpack_half_1x16_m"); > + factory.emit(assign(m, m_rval)); > + > + /* Preliminaries > + * ------------- > + * > + * For a float16, the bit layout is: > + * > + * sign: 15 > + * exponent: 10:14 > + * mantissa: 0:9 > + * > + * Let f16 be a float16 value. The sign, exponent, and mantissa > + * determine its value thus: > + * > + * if e16 = 0 and m16 = 0, then zero: (-1)^s16 * 0 > (1) > + * if e16 = 0 and m16!= 0, then subnormal: (-1)^s16 * 2^(e16 - > 14) * (m16 / 2^10) (2) > + * if 0 < e16 < 31, then normal: (-1)^s16 * 2^(e16 - > 15) * (1 + m16 / 2^10) (3) > + * if e16 = 31 and m16 = 0, then infinite: (-1)^s16 * inf > (4) > + * if e16 = 31 and m16 != 0, then NaN > (5) > + * > + * where 0 <= m16 < 2^10. > + * > + * For a float32, the bit layout is: > + * > + * sign: 31 > + * exponent: 23:30 > + * mantissa: 0:22 > + * > + * Let f32 be a float32 value. The sign, exponent, and mantissa > + * determine its value thus: > + * > + * if e32 = 0 and m32 = 0, then zero: (-1)^s * 0 > (10) > + * if e32 = 0 and m32 != 0, then subnormal: (-1)^s * 2^(e32 - > 126) * (m32 / 2^23) (11) > + * if 0 < e32 < 255, then normal: (-1)^s * 2^(e32 - > 127) * (1 + m32 / 2^23) (12) > + * if e32 = 255 and m32 = 0, then infinite: (-1)^s * inf > (13) > + * if e32 = 255 and m32 != 0, then NaN > (14) > + * > + * where 0 <= m32 < 2^23. > + * > + * Calculation > + * ----------- > + * Our task is to compute s32, e32, m32 given f16. Since this > function > + * ignores the sign bit, assume that s32 = s16 = 0. There are > several > + * cases consider. > + */ > + > + factory.emit( > + > + /* Case 1) f16 is zero or subnormal. > + * > + * The simplest method of calcuating f32 in this case is > + * > + * f32 = f16 (20) > + * = 2^(-14) * (m16 / 2^10) (21) > + * = m16 / 2^(-24) (22) > + */ > + > + /* if (e16 == 0) { */ > + if_tree(equal(e, constant(0u)), > + > + /* u32 = bitcast_f2u(float(m) / float(1 << 24)); */ > + assign(u32, expr(ir_unop_bitcast_f2u, > + div(u2f(m), constant((float)(1 << 24))))), > + > + /* Case 2) f16 is normal. > + * > + * The equation > + * > + * f32 = f16 (30) > + * 2^(e32 - 127) * (1 + m32 / 2^23) = (31) > + * 2^(e16 - 15) * (1 + m16 / 2^10) > + * > + * can be decomposed into two > + * > + * 2^(e32 - 127) = 2^(e16 - 15) (32) > + * 1 + m32 / 2^23 = 1 + m16 / 2^10 (33) > + * > + * which solve to > + * > + * e32 = e16 + 112 (34) > + * m32 = m16 * 2^13 (35) > + */ > + > + /* } else if (e16 < 31)) { */ > + if_tree(less(e, constant(31u << 10u)), > + > + /* u32 = ((e << 13) + (112 << 23)) > + * | (m << 13); > + */ > + assign(u32, bit_or(add(lshift(e, constant(13u)), > + constant(112u << 23u)), > + lshift(m, constant(13u)))), > I believe you can save one operation by factoring out the "<< 13" to get: assign(u32, lshift(bit_or(add(e, constant(112u << 10u)), m), constant(13u))) > + > + /* Case 3) f16 is infinite. */ > + if_tree(equal(m, constant(0u)), > + > + assign(u32, constant(255u << 23u)), > + > + /* Case 4) f16 is NaN. */ > + /* } else { */ > + > + assign(u32, constant(0x7fffffffu)))))); > + > + /* } */ > + > + return deref(u32).val; > + } > + > (snip) Well done! This is a tour de force, Chad. The only comment that I consider blocking is the 142 vs 143 mix-up I noted above, and even that is only in the comments. With that fixed, this patch is: Reviewed-by: Paul Berry <stereotype...@gmail.com>
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev