On 16/03/17 04:23, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> On 04/03/17 00:19, Francisco Jerez wrote: >>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >>> >>>> Add support to SEL instruction and add an assert to detect unsupported >>>> instructions than do d2x conversions. >>>> >>>> Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> >>>> --- >>>> >>>> Curro, this patch legalizes SEL instruction too. If other optimizations >>>> modify later any SEL's (or any other instruction's) destination type >>>> (hence, producing a non-lowered d2x conversion), we can call it again >>>> around the end of fs_visitor::optimize(). Possibly together with >>>> lower_simd_width() just in case it was added later. >>>> >>> >>> This sounds rather scary... How do you make sure that this doesn't lead >>> to an infinite legalization-optimization loop in which copy propagation >>> reverses the effect of lower_d2x making double conversions illegal >>> again? If you do already, why do you need to run lower_d2x multiple >>> times? Wouldn't it be sufficient to run it once near the end of >>> optimize(), and then re-run copy propagation and possibly DCE? >>> >>>> For that reason there is the inst->dst.stride > 1 condition in the >>>> test. This detects if either we emitted a strided destination in >>>> purpose or it was as a result of a previous lower_d2x run, we don't >>>> want to lowered it. >>>> >>> The problem with this is that if you ended up with dst.stride > 1 due to >>> different fields of the same scalar quantity being defined by two >>> separate instructions (e.g. by using subscript(dst, ..., i)), you *need* >>> to apply the lowering pass regardless, because otherwise the second >>> instruction will corrupt the data written by the first instruction. >>> >>>> However, as I have not hit that case yet, I prefer to wait for your >>>> opinion. What do you think? >>>> >>>> >>>> src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 >>>> ++++++++++++++++++-------- >>>> 1 file changed, 41 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>>> b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>>> index a2db1154615..330f2552929 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp >>>> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x() >>>> bool progress = false; >>>> >>>> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { >>>> - if (inst->opcode != BRW_OPCODE_MOV) >>>> - continue; >>>> - >>>> - if (inst->dst.type != BRW_REGISTER_TYPE_F && >>>> - inst->dst.type != BRW_REGISTER_TYPE_D && >>>> - inst->dst.type != BRW_REGISTER_TYPE_UD) >>>> - continue; >>>> - >>>> - if (inst->src[0].type != BRW_REGISTER_TYPE_DF && >>>> - inst->src[0].type != BRW_REGISTER_TYPE_UQ && >>>> - inst->src[0].type != BRW_REGISTER_TYPE_Q) >>>> + if (get_exec_type_size(inst) != 8 || >>>> + type_sz(inst->dst.type) >= get_exec_type_size(inst) || >>> >>> Note that some type conversion restrictions apply even if the execution >>> type is single-precision, and even if the destination type size is not >>> less than the execution type, e.g. according to the hardware docs SEL >>> doesn't support F->UD or F->DF conversions which the condition above >>> would consider okay. >>> >>>> + inst->dst.stride > 1) >>>> continue; >>>> >>>> assert(inst->dst.file == VGRF); >>>> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x() >>>> * So we need to allocate a temporary that's two registers, and >>>> then do >>>> * a strided MOV to get the lower DWord of every Qword that has the >>>> * result. >>>> + * >>>> + * This pass legalizes all the DF conversions to narrower types. >>>> */ >>>> - fs_reg temp = ibld.vgrf(inst->src[0].type, 1); >>>> - fs_reg strided_temp = subscript(temp, inst->dst.type, 0); >>>> - ibld.MOV(strided_temp, inst->src[0]); >>>> - ibld.MOV(dst, strided_temp); >>>> + switch (inst->opcode) { >>> >>> I suggest you refactor this into a helper function 'bool >>> supports_type_conversion(inst, dst_type, exec_type)' that returns false >>> for SEL and likely other things. It might be a useful thing to have in >>> other places, e.g. for late optimization passes like copy propagation >>> where we need to make sure that no additional illegal conversions are >>> introduced. If the value returned is false you'd do what you have below >>> for the SEL instruction, if it's true you'd do nothing unless the >>> instruction is double-precision and the destination type is smaller than >>> the execution type, in which case you'd do what you have below for >>> MOV/MOV_INDIRECT. >>> >> >> I have implemented the generalization of this pass following this >> suggestion, however I needed to disallow it for some specific cases: >> >> * Message related opcodes (samplers, urb writes, etc): they are not >> actually conversions between source registers and destination, so we >> cannot legalize them. >> * Opcodes like ASR, SHR, SHL where we reinterpret the result of the >> opcode, so no need to legalize it either. >> * I also restricted this legalization to opcodes where the arguments >> have all of them the same type. In some cases, we have mixed source >> arguments than will be either fixed in the generator (i.e. BFE/BFI2) or >> we don't know which of them use for the destination's data type in the >> legalization. I can add a TODO comment if you want. >> * This pass only legalizes conversions to narrower types. I have a patch >> to rename this pass to 'lower_narrow_conversions' (suggestions for >> better names are welcome). >> > > This sounds kind of strange to me... Because the type conversion > restrictions are opcode-specific, why do you need to care about things > you cannot even handle? >
I see two possibilities: 1) Legalize only the opcodes that we know are doing wrong things. This is the approach of this v3, where in the switch we have a case for MOV* and another for SEL opcodes. If more instructions behave wrongly, we add a new 'case' to the switch. 2) Legalize all opcodes except the ones we know the legalization makes no sense or it is wrong. This is the approach I mentioned in my previous email, explicitly listing the exceptions I found to avoid GPU hangs and regressions. I thought you preferred 2) in a previous email, because we can legalize undetected cases. If you prefer 1), I am OK (I have the patch too). Once you tell me which option you prefer, I will send the v4. Sam >> This change is a little bit scary because if we legalize the wrong case >> (for example a sampler message) we end up having regressions or, even >> worse, GPU hangs. Also, depending of the generation, some tests start to >> regress because some instructions are emitted slightly different. >> >> I think I fixed all the GPU hangs in all the generations I have access >> to and I don't see any regressions on Intel CI, but I am worried if this >> change could bite us again in the future (games, applications, new >> tests/opcodes, etc). >> >> If you ask me, I prefer to keep this lowering pass as it is, i.e. >> legalize only d2x conversions, to avoid blocking this series because of >> this. Once this patch series lands in master, I can send an RFC patch >> with this change and we could discuss how it can be done. >> >> What do you think? >> > > I don't think this patch is the only sketchy corner of the series, > aren't you planning on sending a v4 anyway? I suggest you just include > what you have written. > >> Sam >> >>>> + case SHADER_OPCODE_MOV_INDIRECT: >>>> + case BRW_OPCODE_MOV: { >>>> + fs_reg temp = ibld.vgrf(inst->src[0].type, 1); >>>> + fs_reg strided_temp = subscript(temp, inst->dst.type, 0); >>>> + /* We clone the original instruction as we are going to modify it >>>> + * and emit another one after it. >>>> + */ >>>> + fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst); >>> >>> Why don't you just modify the original instruction instead of cloning >>> it, modifying the clone, and then removing the original? >>> >>>> + strided_inst->dst = strided_temp; >>>> + /* As it is an strided destination, we write n-times more >>>> + * being n the size difference between source and destination >>>> types. >>>> + */ >>>> + strided_inst->size_written *= (type_sz(inst->src[0].type) / >>>> type_sz(inst->dst.type)); >>>> + ibld.emit(strided_inst); >>>> + ibld.MOV(dst, strided_temp); >>>> + /* Remove original instruction, now that is superseded. */ >>>> + inst->remove(block); >>>> + break; >>>> + } >>>> + case BRW_OPCODE_SEL: { >>>> + fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1); >>>> + fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0); >>>> + fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1); >>>> + fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0); >>>> >>>> - inst->remove(block); >>>> + /* Let's convert the operands to the destination type first */ >>>> + ibld.MOV(strided_temp0, inst->src[0]); >>>> + ibld.MOV(strided_temp1, inst->src[1]); >>>> + inst->src[0] = strided_temp0; >>>> + inst->src[1] = strided_temp1; >>> >>> Why don't you change the destination type to match the execution type so >>> you only need to emit one additional converting move instead of two when >>> there is a type mismatch? >>> >>>> + break; >>>> + } >>>> + default: >>>> + /* FIXME: If this is not a supported instruction, then we need >>>> to support it. */ >>>> + unreachable("Not supported instruction"); >>>> + } >>>> progress = true; >>>> } >>>> >>>> -- >>>> 2.11.0
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev