On Fri, Sep 29, 2017 at 4:48 AM, Iago Toral <ito...@igalia.com> wrote: > On Thu, 2017-09-28 at 23:05 -0700, Matt Turner wrote: >> ... without the float -> double conversion. Low power parts have >> additional restrictions when it comes to operating on 64-bit types, >> and >> the instruction used to do the conversion violates one of them: >> specifically, the restriction that "Source and Destination horizontal >> stride must be aligned to the same qword". >> >> Previously we generated a float and then converted, but we can avoid >> the >> conversion by using the same extract-the-sign-bit + or-in-1.0 >> algorithm >> by directly operating on the high four bytes of each double-precision >> component in the result. >> >> In SIMD8 and SIMD16 this cuts one instruction from the >> implementation, >> and more importantly that instruction is the one which violated the >> regioning restriction. >> >> Along the way I removed some comments that I did not think helped, >> and >> some code about double comparisons which does not seem to be >> necessary >> today. >> >> This prevents validation failures caught by the new EU validation >> code >> added in later patches. >> --- >> src/intel/compiler/brw_fs_nir.cpp | 49 +++++++-------------------- >> ------------ >> 1 file changed, 9 insertions(+), 40 deletions(-) >> >> diff --git a/src/intel/compiler/brw_fs_nir.cpp >> b/src/intel/compiler/brw_fs_nir.cpp >> index 8ea4297ceb..1b011172a6 100644 >> --- a/src/intel/compiler/brw_fs_nir.cpp >> +++ b/src/intel/compiler/brw_fs_nir.cpp >> @@ -697,49 +697,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, >> nir_alu_instr *instr) > > There is a comment right before this saying that CMP with SIMD16 > requires special handling that is no longer valid, so we want to remove > that too.
Oh, thanks. I missed that. >> * - We need to produce a DF result. >> */ >> >> - /* 2-src instructions can't have 64-bit immediates, so put >> 0.0 in >> - * a register and compare with that. >> - */ >> - fs_reg tmp = vgrf(glsl_type::double_type); >> - bld.MOV(tmp, setup_imm_df(bld, 0.0)); >> - >> - /* A direct DF CMP using the flag register (null dst) won't >> work in >> - * SIMD16 because the CMP will be split in two by >> lower_simd_width, >> - * resulting in two CMP instructions with the same dst >> (NULL), >> - * leading to dead code elimination of the first one. In >> SIMD8, >> - * however, there is no need to split the CMP and we can >> save some >> - * work. >> - */ >> - fs_reg dst_tmp = vgrf(glsl_type::double_type); >> - bld.CMP(dst_tmp, op[0], tmp, BRW_CONDITIONAL_NZ); >> - >> - /* In SIMD16 we want to avoid using a NULL dst register >> with DF CMP, >> - * so we store the result of the comparison in a vgrf >> instead and >> - * then we generate a UD comparison from that that won't >> have to >> - * be split by lower_simd_width. This is what NIR does to >> handle >> - * double comparisons in the general case. >> - */ >> - if (bld.dispatch_width() == 16 ) { >> - fs_reg dst_tmp_ud = retype(dst_tmp, >> BRW_REGISTER_TYPE_UD); >> - bld.MOV(dst_tmp_ud, subscript(dst_tmp, >> BRW_REGISTER_TYPE_UD, 0)); >> - bld.CMP(bld.null_reg_ud(), >> - dst_tmp_ud, brw_imm_ud(0), BRW_CONDITIONAL_NZ); >> - } > > So this is fixed now, great! :) I was very pleasantly surprised to see that it "just worked" and marked split instructions reading/writing the flag register with the correct quarter control bits :) >> - >> - /* Get the high 32-bit of each double component where the >> sign is */ >> - fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UD); >> - bld.MOV(result_int, subscript(op[0], BRW_REGISTER_TYPE_UD, >> 1)); >> + fs_reg zero = vgrf(glsl_type::double_type); >> + bld.MOV(zero, setup_imm_df(bld, 0.0)); >> + bld.CMP(bld.null_reg_df(), op[0], zero, >> BRW_CONDITIONAL_NZ); >> >> - /* Get the sign bit */ >> - bld.AND(result_int, result_int, brw_imm_ud(0x80000000u)); >> + bld.MOV(result, zero); >> >> - /* Add 1.0 to the sign, predicated to skip the case of >> op[0] == 0.0 */ >> - inst = bld.OR(result_int, result_int, >> brw_imm_ud(0x3f800000u)); >> - inst->predicate = BRW_PREDICATE_NORMAL; >> + fs_reg r = subscript(result, BRW_REGISTER_TYPE_UD, 1); >> + bld.AND(r, subscript(op[0], BRW_REGISTER_TYPE_UD, 1), >> + brw_imm_ud(0x80000000u)); >> >> - /* Convert from 32-bit float to 64-bit double */ >> - result.type = BRW_REGISTER_TYPE_DF; >> - inst = bld.MOV(result, retype(result_int, >> BRW_REGISTER_TYPE_F)); >> + set_predicate(BRW_PREDICATE_NORMAL, >> + bld.OR(r, r, brw_imm_ud(0x3ff00000u))); > > This should be 0x3ff80000u, right? I think that's the high 4 bytes of the double value 1.5. 0x3ff00000 is for 1.0. (I just wrote some little test programs to confirm). > With that fixed: > Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> Thanks! _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev