On 14/03/17 11:17, Samuel Iglesias Gonsálvez wrote: > > > On 04/03/17 00:19, Francisco Jerez wrote: >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >> >>> Add support to SEL instruction and add an assert to detect unsupported >>> instructions than do d2x conversions. >>> >>> Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> >>> --- >>> >>> Curro, this patch legalizes SEL instruction too. If other optimizations >>> modify later any SEL's (or any other instruction's) destination type >>> (hence, producing a non-lowered d2x conversion), we can call it again >>> around the end of fs_visitor::optimize(). Possibly together with >>> lower_simd_width() just in case it was added later. >>> >> >> This sounds rather scary... How do you make sure that this doesn't lead >> to an infinite legalization-optimization loop in which copy propagation >> reverses the effect of lower_d2x making double conversions illegal >> again? If you do already, why do you need to run lower_d2x multiple >> times? Wouldn't it be sufficient to run it once near the end of >> optimize(), and then re-run copy propagation and possibly DCE? >> >>> For that reason there is the inst->dst.stride > 1 condition in the >>> test. This detects if either we emitted a strided destination in >>> purpose or it was as a result of a previous lower_d2x run, we don't >>> want to lowered it. >>> >> The problem with this is that if you ended up with dst.stride > 1 due to >> different fields of the same scalar quantity being defined by two >> separate instructions (e.g. by using subscript(dst, ..., i)), you *need* >> to apply the lowering pass regardless, because otherwise the second >> instruction will corrupt the data written by the first instruction. >> >>> However, as I have not hit that case yet, I prefer to wait for your >>> opinion. What do you think? >>> >>> >>> src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 >>> ++++++++++++++++++-------- >>> 1 file changed, 41 insertions(+), 16 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>> index a2db1154615..330f2552929 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x() >>> bool progress = false; >>> >>> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { >>> - if (inst->opcode != BRW_OPCODE_MOV) >>> - continue; >>> - >>> - if (inst->dst.type != BRW_REGISTER_TYPE_F && >>> - inst->dst.type != BRW_REGISTER_TYPE_D && >>> - inst->dst.type != BRW_REGISTER_TYPE_UD) >>> - continue; >>> - >>> - if (inst->src[0].type != BRW_REGISTER_TYPE_DF && >>> - inst->src[0].type != BRW_REGISTER_TYPE_UQ && >>> - inst->src[0].type != BRW_REGISTER_TYPE_Q) >>> + if (get_exec_type_size(inst) != 8 || >>> + type_sz(inst->dst.type) >= get_exec_type_size(inst) || >> >> Note that some type conversion restrictions apply even if the execution >> type is single-precision, and even if the destination type size is not >> less than the execution type, e.g. according to the hardware docs SEL >> doesn't support F->UD or F->DF conversions which the condition above >> would consider okay. >> >>> + inst->dst.stride > 1) >>> continue; >>> >>> assert(inst->dst.file == VGRF); >>> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x() >>> * So we need to allocate a temporary that's two registers, and then >>> do >>> * a strided MOV to get the lower DWord of every Qword that has the >>> * result. >>> + * >>> + * This pass legalizes all the DF conversions to narrower types. >>> */ >>> - fs_reg temp = ibld.vgrf(inst->src[0].type, 1); >>> - fs_reg strided_temp = subscript(temp, inst->dst.type, 0); >>> - ibld.MOV(strided_temp, inst->src[0]); >>> - ibld.MOV(dst, strided_temp); >>> + switch (inst->opcode) { >> >> I suggest you refactor this into a helper function 'bool >> supports_type_conversion(inst, dst_type, exec_type)' that returns false >> for SEL and likely other things. It might be a useful thing to have in >> other places, e.g. for late optimization passes like copy propagation >> where we need to make sure that no additional illegal conversions are >> introduced. If the value returned is false you'd do what you have below >> for the SEL instruction, if it's true you'd do nothing unless the >> instruction is double-precision and the destination type is smaller than >> the execution type, in which case you'd do what you have below for >> MOV/MOV_INDIRECT. >> > > I have implemented the generalization of this pass following this > suggestion, however I needed to disallow it for some specific cases: > > * Message related opcodes (samplers, urb writes, etc): they are not > actually conversions between source registers and destination, so we > cannot legalize them. > * Opcodes like ASR, SHR, SHL where we reinterpret the result of the > opcode, so no need to legalize it either. > * I also restricted this legalization to opcodes where the arguments > have all of them the same type. In some cases, we have mixed source > arguments than will be either fixed in the generator (i.e. BFE/BFI2) or > we don't know which of them use for the destination's data type in the > legalization. I can add a TODO comment if you want. > * This pass only legalizes conversions to narrower types. I have a patch > to rename this pass to 'lower_narrow_conversions' (suggestions for > better names are welcome). >
Correction: actually this pass legalizes conversions to narrower and equal size types. I need to find a better name for it :-( Sam > This change is a little bit scary because if we legalize the wrong case > (for example a sampler message) we end up having regressions or, even > worse, GPU hangs. Also, depending of the generation, some tests start to > regress because some instructions are emitted slightly different. > > I think I fixed all the GPU hangs in all the generations I have access > to and I don't see any regressions on Intel CI, but I am worried if this > change could bite us again in the future (games, applications, new > tests/opcodes, etc). > > If you ask me, I prefer to keep this lowering pass as it is, i.e. > legalize only d2x conversions, to avoid blocking this series because of > this. Once this patch series lands in master, I can send an RFC patch > with this change and we could discuss how it can be done. > > What do you think? > > Sam > >>> + case SHADER_OPCODE_MOV_INDIRECT: >>> + case BRW_OPCODE_MOV: { >>> + fs_reg temp = ibld.vgrf(inst->src[0].type, 1); >>> + fs_reg strided_temp = subscript(temp, inst->dst.type, 0); >>> + /* We clone the original instruction as we are going to modify it >>> + * and emit another one after it. >>> + */ >>> + fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst); >> >> Why don't you just modify the original instruction instead of cloning >> it, modifying the clone, and then removing the original? >> >>> + strided_inst->dst = strided_temp; >>> + /* As it is an strided destination, we write n-times more >>> + * being n the size difference between source and destination >>> types. >>> + */ >>> + strided_inst->size_written *= (type_sz(inst->src[0].type) / >>> type_sz(inst->dst.type)); >>> + ibld.emit(strided_inst); >>> + ibld.MOV(dst, strided_temp); >>> + /* Remove original instruction, now that is superseded. */ >>> + inst->remove(block); >>> + break; >>> + } >>> + case BRW_OPCODE_SEL: { >>> + fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1); >>> + fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0); >>> + fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1); >>> + fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0); >>> >>> - inst->remove(block); >>> + /* Let's convert the operands to the destination type first */ >>> + ibld.MOV(strided_temp0, inst->src[0]); >>> + ibld.MOV(strided_temp1, inst->src[1]); >>> + inst->src[0] = strided_temp0; >>> + inst->src[1] = strided_temp1; >> >> Why don't you change the destination type to match the execution type so >> you only need to emit one additional converting move instead of two when >> there is a type mismatch? >> >>> + break; >>> + } >>> + default: >>> + /* FIXME: If this is not a supported instruction, then we need to >>> support it. */ >>> + unreachable("Not supported instruction"); >>> + } >>> progress = true; >>> } >>> >>> -- >>> 2.11.0 >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev