On 04/03/17 00:19, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> Add support to SEL instruction and add an assert to detect unsupported >> instructions than do d2x conversions. >> >> Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> >> --- >> >> Curro, this patch legalizes SEL instruction too. If other optimizations >> modify later any SEL's (or any other instruction's) destination type >> (hence, producing a non-lowered d2x conversion), we can call it again >> around the end of fs_visitor::optimize(). Possibly together with >> lower_simd_width() just in case it was added later. >> > > This sounds rather scary... How do you make sure that this doesn't lead > to an infinite legalization-optimization loop in which copy propagation > reverses the effect of lower_d2x making double conversions illegal > again? If you do already, why do you need to run lower_d2x multiple > times? Wouldn't it be sufficient to run it once near the end of > optimize(), and then re-run copy propagation and possibly DCE? >
Right. I am going to change where we call it. >> For that reason there is the inst->dst.stride > 1 condition in the >> test. This detects if either we emitted a strided destination in >> purpose or it was as a result of a previous lower_d2x run, we don't >> want to lowered it. >> > The problem with this is that if you ended up with dst.stride > 1 due to > different fields of the same scalar quantity being defined by two > separate instructions (e.g. by using subscript(dst, ..., i)), you *need* > to apply the lowering pass regardless, because otherwise the second > instruction will corrupt the data written by the first instruction. > >> However, as I have not hit that case yet, I prefer to wait for your >> opinion. What do you think? >> >> >> src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 >> ++++++++++++++++++-------- >> 1 file changed, 41 insertions(+), 16 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >> index a2db1154615..330f2552929 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x() >> bool progress = false; >> >> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { >> - if (inst->opcode != BRW_OPCODE_MOV) >> - continue; >> - >> - if (inst->dst.type != BRW_REGISTER_TYPE_F && >> - inst->dst.type != BRW_REGISTER_TYPE_D && >> - inst->dst.type != BRW_REGISTER_TYPE_UD) >> - continue; >> - >> - if (inst->src[0].type != BRW_REGISTER_TYPE_DF && >> - inst->src[0].type != BRW_REGISTER_TYPE_UQ && >> - inst->src[0].type != BRW_REGISTER_TYPE_Q) >> + if (get_exec_type_size(inst) != 8 || >> + type_sz(inst->dst.type) >= get_exec_type_size(inst) || > > Note that some type conversion restrictions apply even if the execution > type is single-precision, and even if the destination type size is not > less than the execution type, e.g. according to the hardware docs SEL > doesn't support F->UD or F->DF conversions which the condition above > would consider okay. > Right. >> + inst->dst.stride > 1) >> continue; >> >> assert(inst->dst.file == VGRF); >> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x() >> * So we need to allocate a temporary that's two registers, and then >> do >> * a strided MOV to get the lower DWord of every Qword that has the >> * result. >> + * >> + * This pass legalizes all the DF conversions to narrower types. >> */ >> - fs_reg temp = ibld.vgrf(inst->src[0].type, 1); >> - fs_reg strided_temp = subscript(temp, inst->dst.type, 0); >> - ibld.MOV(strided_temp, inst->src[0]); >> - ibld.MOV(dst, strided_temp); >> + switch (inst->opcode) { > > I suggest you refactor this into a helper function 'bool > supports_type_conversion(inst, dst_type, exec_type)' that returns false > for SEL and likely other things. It might be a useful thing to have in > other places, e.g. for late optimization passes like copy propagation > where we need to make sure that no additional illegal conversions are > introduced. If the value returned is false you'd do what you have below > for the SEL instruction, if it's true you'd do nothing unless the > instruction is double-precision and the destination type is smaller than > the execution type, in which case you'd do what you have below for > MOV/MOV_INDIRECT. > OK. I'm going to add it as static in the file, so we can promote it to a header file when it is used within other optimizations/lowerings. >> + case SHADER_OPCODE_MOV_INDIRECT: >> + case BRW_OPCODE_MOV: { >> + fs_reg temp = ibld.vgrf(inst->src[0].type, 1); >> + fs_reg strided_temp = subscript(temp, inst->dst.type, 0); >> + /* We clone the original instruction as we are going to modify it >> + * and emit another one after it. >> + */ >> + fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst); > > Why don't you just modify the original instruction instead of cloning > it, modifying the clone, and then removing the original? > Right but I did it because I need to emit the strided MOV after the instructions... which means adding support for this in fs_builder. As you prefer to reuse the original instruction, then I will send a patch implementing emit_after() functions in fs_builder and use them here. >> + strided_inst->dst = strided_temp; >> + /* As it is an strided destination, we write n-times more >> + * being n the size difference between source and destination >> types. >> + */ >> + strided_inst->size_written *= (type_sz(inst->src[0].type) / >> type_sz(inst->dst.type)); >> + ibld.emit(strided_inst); >> + ibld.MOV(dst, strided_temp); >> + /* Remove original instruction, now that is superseded. */ >> + inst->remove(block); >> + break; >> + } >> + case BRW_OPCODE_SEL: { >> + fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1); >> + fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0); >> + fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1); >> + fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0); >> >> - inst->remove(block); >> + /* Let's convert the operands to the destination type first */ >> + ibld.MOV(strided_temp0, inst->src[0]); >> + ibld.MOV(strided_temp1, inst->src[1]); >> + inst->src[0] = strided_temp0; >> + inst->src[1] = strided_temp1; > > Why don't you change the destination type to match the execution type so > you only need to emit one additional converting move instead of two when > there is a type mismatch? > This was to avoid defining emit_after() in fs_builder. But as you prefer that, then I will do it. Thanks for the review! :) Sam >> + break; >> + } >> + default: >> + /* FIXME: If this is not a supported instruction, then we need to >> support it. */ >> + unreachable("Not supported instruction"); >> + } >> progress = true; >> } >> >> -- >> 2.11.0
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev