Iago Toral <ito...@igalia.com> writes: > On Tue, 2016-08-02 at 18:27 -0700, Francisco Jerez wrote: >> Iago Toral Quiroga <ito...@igalia.com> writes: >> >> > >> > --- >> > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++++++++++++++++++ >> > 1 file changed, 18 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > index 1525a3d..4014020 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> > @@ -1497,6 +1497,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr >> > *instr) >> > emit(CMP(dst, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ)); >> > break; >> > >> > + case nir_op_d2b: { >> > + /* two-argument instructions can't take 64-bit immediates */ >> > + dst_reg zero = dst_reg(this, glsl_type::dvec4_type); >> > + emit(MOV(zero, brw_imm_df(0.0))); >> > + >> > + dst_reg tmp = dst_reg(this, glsl_type::dvec4_type); >> > + emit(CMP(tmp, op[0], src_reg(zero), BRW_CONDITIONAL_NZ)); >> > + >> > + /* Convert the double CMP result to a single boolean result. >> > For that >> > + * we take the low 32-bit chunk of each DF component in the >> > result. >> > + * and do a final MOV to honor the original writemask >> > + */ >> > + dst_reg result = dst_reg(this, glsl_type::bvec4_type); >> > + emit(VEC4_OPCODE_PICK_LOW_32BIT, result, src_reg(tmp)); >> > + emit(MOV(dst, src_reg(result))); >> Couldn't you just do a single CMP instruction of the double-precision >> argument and a single-precision 0.0 immediate? > > It never occurred to me that we could do that but it seems to work just > fine. > >> I think you could >> potentially also use a 32bit destination type on the CMP instruction >> so >> you don't need to emit the PICK_LOW+MOV instructions afterwards. > > This does not seem to work though. > >> It may >> hit an instruction decompression bug but it could be cleaned up by >> the >> SIMD lowering pass afterwards, AFAICT the result would be two >> uncompressed instructions instead of the two uncompressed plus two >> compressed instructions above. > > I tried splitting the instruction just in case. Notice that doing this > requires that we improve the splitting pass to support splitting > instructions that write only one register (the original series did not > implement support for that because so far the only cared to split DF > instructions that wrote more than one register). I implemented a couple > of quick hacks to get this done so I could test this but the result is > still incorrect. > > For reference, if we don't split, the resulting CMP looks like this > (after full scalarization): > > cmp.nz.f0(8) g8<1>.xD g6<2,2,1>.xyxyDF 0F { align16 1Q }; > cmp.nz.f0(8) g8<1>.yD g6<2,2,1>.zwzwDF 0F { align16 1Q }; > cmp.nz.f0(8) g8<1>.zD g6.2<0,2,1>.xyxyDF 0F { align16 1Q }; > cmp.nz.f0(8) g8<1>.wD g6.2<0,2,1>.zwzwDF 0F { align16 1Q }; > > And if we split the instruction in two we produce this: > > cmp.nz.f0(4) g6<1>.xD g1<0,2,1>.xyxyDF 0F { align16 1N }; > cmp.nz.f0(4) g6<1>.yD g1<0,2,1>.zwzwDF 0F { align16 1N }; > cmp.nz.f0(4) g6<1>.zD g1.2<0,2,1>.xyxyDF 0F { align16 1N }; > cmp.nz.f0(4) g6<1>.wD g1.2<0,2,1>.zwzwDF 0F { align16 1N }; > cmp.nz.f0(4) g6.4<1>.xD g1<0,2,1>.xyxyDF 0F { align16 2N }; > cmp.nz.f0(4) g6.4<1>.yD g1<0,2,1>.zwzwDF 0F { align16 2N }; > cmp.nz.f0(4) g6.4<1>.zD g1.2<0,2,1>.xyxyDF 0F { align16 2N }; > cmp.nz.f0(4) g6.4<1>.wD g1.2<0,2,1>.zwzwDF 0F { align16 2N }; > > Both sequences of instructions look correct to me as far as the > splitting and DF regioning go, so I guess the 32-bit CMP result is not > doing what we want. > > I am not too surprised about this. Even if we use a 32b dst type I > suppose the instruction execution type is still 64b so I guess there > should be a 64b to 32b conversion involved in writing to a 32b result. > Seeing how 64b to 32b conversions require that we emit specific code > using ALIGN1 mode to deal with alignment requirements I guess it is not > too surprising that this does not work, right? >
Yeah, that's not too surprising, don't worry about it. Another idea would be to do something along the lines of 'MOV.nz null, <df source goes here>' to check whether the source is non-zero (so you avoid loading the immediate which is a non-trivial operation on Gen7), and then use a single-precision SEL instruction (or two predicated MOV instructions) to write true or false to the destination (so you don't need the PICK_LOW+MOV sequence). > I can send you a trace if you want to have a look at it. > > Iago > >> > >> > + break; >> > + } >> > + >> > case nir_op_i2b: >> > emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ)); >> > break;
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev