On Thu, 2017-03-23 at 12:01 -0700, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > On Wed, 2017-03-22 at 15:47 -0700, Francisco Jerez wrote: > > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > > > Generalize it to lower any unsupported narrower conversion. > > > > > > > > v2 (Curro): > > > > - Add supports_type_conversion() > > > > - Reuse existing intruction instead of cloning it. > > > > - Generalize d2x to narrower and equal size conversions. > > > > > > > > Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > > > > --- > > > > src/intel/compiler/brw_fs.cpp | 11 ++-- > > > > src/intel/compiler/brw_fs_lower_d2x.cpp | 97 > > > > ++++++++++++++++++++++++--------- > > > > 2 files changed, 77 insertions(+), 31 deletions(-) > > > > > > > > diff --git a/src/intel/compiler/brw_fs.cpp > > > > b/src/intel/compiler/brw_fs.cpp > > > > index 6da23b17107..8612a83805d 100644 > > > > --- a/src/intel/compiler/brw_fs.cpp > > > > +++ b/src/intel/compiler/brw_fs.cpp > > > > @@ -5694,11 +5694,6 @@ fs_visitor::optimize() > > > > OPT(dead_code_eliminate); > > > > } > > > > > > > > - if (OPT(lower_d2x)) { > > > > - OPT(opt_copy_propagation); > > > > - OPT(dead_code_eliminate); > > > > - } > > > > - > > > > OPT(lower_simd_width); > > > > > > > > /* After SIMD lowering just in case we had to unroll the > > > > EOT > > > > send. */ > > > > @@ -5745,6 +5740,12 @@ fs_visitor::optimize() > > > > OPT(dead_code_eliminate); > > > > } > > > > > > > > + if (OPT(lower_d2x)) { > > > > + OPT(opt_copy_propagation); > > > > + OPT(dead_code_eliminate); > > > > + OPT(lower_simd_width); > > > > + } > > > > + > > > > lower_uniform_pull_constant_loads(); > > > > > > > > validate(); > > > > diff --git a/src/intel/compiler/brw_fs_lower_d2x.cpp > > > > b/src/intel/compiler/brw_fs_lower_d2x.cpp > > > > index a2db1154615..fa25d313dcd 100644 > > > > --- a/src/intel/compiler/brw_fs_lower_d2x.cpp > > > > +++ b/src/intel/compiler/brw_fs_lower_d2x.cpp > > > > @@ -27,47 +27,92 @@ > > > > > > > > using namespace brw; > > > > > > > > +static bool > > > > +supports_type_conversion(fs_inst *inst) { > > > > > > Pointer should be marked const. > > > > > > > + switch(inst->opcode) { > > > > + case BRW_OPCODE_MOV: > > > > + case SHADER_OPCODE_MOV_INDIRECT: > > > > + return true; > > > > + case BRW_OPCODE_SEL: > > > > + return false; > > > > > > I suggest you return 'inst->dst.type == get_exec_type_size(inst)' > > > here > > > in order to simplify the logic below and restructure things in a > > > way > > > that allows you to make opcode-specific decisions here based on > > > the > > > actual execution and destination types. > > > > I guess you meant: > > 'type_sz(inst->dst.type) == get_exec_type_size(inst)' > > > > Uhm, not really, you really want to compare the destination type with > the execution type of the instruction. >
I misunderstood you because of the name: it should be get_exec_type(inst) (i.e. without _size). I am going to create it. Thanks, Sam > > This would make it simpler but we then allow f2d conversion which > > is > > not allowed for SEL. > > > > > > > > > + default: > > > > + /* FIXME: We assume the opcodes don't explicitly > > > > mentioned > > > > + * before just work fine with arbitrary conversions. > > > > + */ > > > > + return true; > > > > + } > > > > +} > > > > + > > > > bool > > > > fs_visitor::lower_d2x() > > > > { > > > > bool progress = false; > > > > > > > > foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { > > > > - if (inst->opcode != BRW_OPCODE_MOV) > > > > - continue; > > > > + bool inst_support_conversion = > > > > supports_type_conversion(inst); > > > > + bool supported_conversion = > > > > + inst_support_conversion && > > > > + (get_exec_type_size(inst) != 8 || > > > > + type_sz(inst->dst.type) > 4 || > > > > + type_sz(inst->dst.type) >= > > > > get_exec_type_size(inst)); > > > > > > > > - if (inst->dst.type != BRW_REGISTER_TYPE_F && > > > > - inst->dst.type != BRW_REGISTER_TYPE_D && > > > > - inst->dst.type != BRW_REGISTER_TYPE_UD) > > > > + /* If the conversion is supported or there is no > > > > conversion > > > > then > > > > + * do nothing. > > > > + */ > > > > + if (supported_conversion || > > > > + (!inst_support_conversion && inst->dst.type == inst- > > > > > src[0].type) || > > > > > > You should be using the execution type here (and below where you > > > allocate the temp0 and temp1 vgrfs) instead of assuming it > > > matches > > > the > > > type of the first source. Also I'm not 100% certain that the > > > combination of this conditional continue, the one below, the > > > if/else > > > branches below and the supported_conversion definitions above > > > catch > > > the > > > exact set of cases where you need to apply lowering. This would > > > be a > > > lot easier to follow (most of the above goes away) if you did > > > something > > > along the lines of: > > > > > > > OK, thanks. > > > > > > if (supports_type_conversion(inst)) { > > > > if (get_exec_type_size(inst) == 8 && type_sz(inst->dst.type) > > > > < > > > > 8) { > > > > // Apply strided move workaround... > > > > progress = true; > > > > } > > > > } else { > > > > // Apply general conversion workaround... > > > > progress = true; > > > > } > > > > + inst->dst.file == BAD_FILE || inst->src[0].file == > > > > BAD_FILE) > > > > > > I don't think the 'inst->dst.file == BAD_FILE || inst- > > > >src[0].file == > > > BAD_FILE' terms of this expression are correct or useful, the > > > 'inst->src[0].file' term because there might be a destination > > > region > > > conversion regardless of what the file of the first source is, > > > the > > > 'inst->dst.file == BAD_FILE' term because you could just rely on > > > DCE > > > instead... > > > > > > > OK > > > > > > continue; > > > > > > > > - if (inst->src[0].type != BRW_REGISTER_TYPE_DF && > > > > - inst->src[0].type != BRW_REGISTER_TYPE_UQ && > > > > - inst->src[0].type != BRW_REGISTER_TYPE_Q) > > > > - continue; > > > > + /* This pass only supports conversion to narrower or > > > > equal > > > > size types. */ > > > > + if (get_exec_type_size(inst) < type_sz(inst->dst.type)) > > > > + continue; > > > > > > > > > > I think you need to apply this condition to the if branch below > > > that > > > deals with DF to F conversions *only*, the else branch you > > > probably > > > need > > > to apply regardless of whether the destination type is larger or > > > smaller > > > than the execution type. > > > > > > > OK > > > > > > - assert(inst->dst.file == VGRF); > > > > assert(inst->saturate == false); > > > > > > I don't see why it's safe to rely on saturate being false here, > > > in > > > any > > > case it would be straightforward to handle it by setting the flag > > > on > > > the > > > emitted MOV instruction and clearing it on the original > > > instruction. > > > > > > > OK > > > > > > - fs_reg dst = inst->dst; > > > > > > > > const fs_builder ibld(this, block, inst); > > > > + fs_reg dst = inst->dst; > > > > > > > > - /* From the Broadwell PRM, 3D Media GPGPU, "Double > > > > Precision > > > > Float to > > > > - * Single Precision Float": > > > > - * > > > > - * The upper Dword of every Qword will be written > > > > with > > > > undefined > > > > - * value when converting DF to F. > > > > - * > > > > - * So we need to allocate a temporary that's two > > > > registers, > > > > and then do > > > > - * a strided MOV to get the lower DWord of every Qword > > > > that > > > > has the > > > > - * result. > > > > - */ > > > > - fs_reg temp = ibld.vgrf(inst->src[0].type, 1); > > > > - fs_reg strided_temp = subscript(temp, inst->dst.type, > > > > 0); > > > > - ibld.MOV(strided_temp, inst->src[0]); > > > > - ibld.MOV(dst, strided_temp); > > > > + if (inst_support_conversion && !supported_conversion) { > > > > + /* From the Broadwell PRM, 3D Media GPGPU, "Double > > > > Precision Float to > > > > + * Single Precision Float": > > > > + * > > > > + * The upper Dword of every Qword will be written > > > > with > > > > undefined > > > > + * value when converting DF to F. > > > > + * > > > > + * So we need to allocate a temporary that's two > > > > registers, and then do > > > > + * a strided MOV to get the lower DWord of every > > > > Qword > > > > that has the > > > > + * result. > > > > + */ > > > > + fs_reg temp = ibld.vgrf(inst->src[0].type, 1); > > > > > > The '1' argument is redundant in all fs_builder::vgrf() calls > > > here > > > and > > > below. > > > > > > > OK > > > > > > + fs_reg strided_temp = subscript(temp, dst.type, 0); > > > > + > > > > + /* We clone the original instruction as we are going > > > > to > > > > modify it > > > > + * and emit another one after it. > > > > + */ > > > > > > Comment seems stale. > > > > > > > Right. > > > > > > + inst->dst = strided_temp; > > > > + /* As it is an strided destination, we write n-times > > > > more > > > > being n the > > > > + * size difference between source and destination > > > > types. > > > > Update > > > > > > I guess by difference you meant ratio here and in the copy-pasted > > > comment below (the comments also have a few minor spelling > > > mistakes > > > but > > > that's unlikely to cause as much confusion). > > > > > > > Right. > > > > > > + * size_written with the new destination. > > > > + */ > > > > + inst->size_written = inst->dst.component_size(inst- > > > > > exec_size); > > > > > > I think I'd add an assert(inst->size_written == > > > inst->dst.component_size(inst->exec_size)) right before you > > > assign > > > 'inst->dst' to point at the temporary in order to make sure you > > > aren't > > > accidentally miscompiling instructions that write multiple > > > components. > > > > > > > OK > > > > > > + ibld.at(block, inst->next).MOV(dst, strided_temp); > > > > + } else { > > > > + fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1); > > > > + fs_reg temp1 = ibld.vgrf(inst->src[0].type, 1); > > > > + fs_reg strided_temp1 = subscript(temp1, dst.type, 0); > > > > + > > > > + inst->dst = temp0; > > > > + /* As it is an strided destination, we write n-times > > > > more > > > > being n the > > > > + * size difference between source and destination > > > > types. > > > > Update > > > > + * size_written with the new destination. > > > > + */ > > > > + inst->size_written = inst->dst.component_size(inst- > > > > > exec_size); > > > > > > > > > > > > - inst->remove(block); > > > > + /* Now, do the conversion to original destination's > > > > type. > > > > */ > > > > + fs_inst *mov = ibld.at(block, inst- > > > > > next).MOV(strided_temp1, temp0); > > > > > > > > + ibld.at(block, mov->next).MOV(dst, strided_temp1); > > > > > > Isn't this MOV and the strided_temp1 handling redundant? I'd > > > expect > > > it > > > to be handled automatically during the next loop iteration of > > > this > > > pass > > > (though you may have to switch back to foreach_block_and_inst() > > > instead > > > of foreach_block_and_inst_safe()), which would also result in > > > better > > > code because the extra move would only be emitted if this is an > > > actual > > > DF->F conversion. > > > > > > > Good catch! > > > > Thanks for all the suggestions, I am going to do them. > > > > Sam > > > > > > + } > > > > progress = true; > > > > } > > > > > > > > -- > > > > 2.11.0 > > > > > > > > _______________________________________________ > > > > mesa-dev mailing list > > > > mesa-dev@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: This is a digitally signed message part
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev