On 07/03/17 22:38, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> On 04/03/17 01:26, Francisco Jerez wrote: >>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >>> >>>> From: Matt Turner <matts...@gmail.com> >>>> >>>> Otherwise for a pack_double_2x32_split opcode, we emit: >>>> >>>> vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 >>>> mov(8) g5<1>UD g5<4>.xUD { align16 >>>> 1Q compacted }; >>>> mov(8) g7<2>UD g5<4,4,1>UD { align1 >>>> 1Q }; >>>> ERROR: When the destination spans two registers, the source must >>>> span two registers >>>> (exceptions for scalar source and packed-word to >>>> packed-dword expansion) >>>> mov(8) g8<2>UD g5.4<4,4,1>UD { align1 >>>> 2N }; >>>> ERROR: The offset from the two source registers must be the same >>>> mov(8) g5<1>UD g6<4>.xUD { align16 >>>> 1Q compacted }; >>>> mov(8) g7.1<2>UD g5<4,4,1>UD { align1 >>>> 1Q }; >>>> ERROR: When the destination spans two registers, the source must >>>> span two registers >>>> (exceptions for scalar source and packed-word to >>>> packed-dword expansion) >>>> mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 >>>> 2N }; >>>> ERROR: The offset from the two source registers must be the same >>>> >>>> The intention was to emit mov(4)s for the instructions that have ERROR >>>> annotations. >>>> >>>> See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test >>>> for example. >>>> >>>> Reviewed-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> >>>> --- >>>> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>>> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>>> index b570792badd..f6034bc8b76 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>>> @@ -2025,6 +2025,7 @@ generate_code(struct brw_codegen *p, >>>> assert(type_sz(dst.type) == 8); >>>> >>>> brw_set_default_access_mode(p, BRW_ALIGN_1); >>>> + brw_set_default_exec_size(p, BRW_EXECUTE_4); >>>> >>> >>> NAK, we're missing a bug elsewhere if the exec_size coming in from the >>> IR is not accurate. You don't happen to be doubling the execution size >>> of this single-precision instruction, do you? >>> >> >> The exec_size from the IR is correct and Matt's patch too. >> >> The problem is that the original instruction >> (VEC4_OPCODE_SET_{HIGH,LOW}_32BIT) is a double-precision one because dst >> is DF, so we doubled the exec_size at the beginning of this function >> -generate_code()-. >> >> However, this opcode reads 32-bits elements from source and write them >> into the respective 32-bit position (low or high) of the dst, so we >> retyped the destination to UD to do it (see below), hence this >> instruction is actually a 32-bit one. >> >> We can either use Matt patch or add an exception to the doubling >> exec_size condition which is at the beginning of this function. In >> either case, I will add a comment explaining this, something like: >> >> /* Don't double exec_size for VEC4_OPCODE_SET_{HIGH,LOW}_32BIT >> * because dst is retyped to a 32-bit type to copy source's values, >> * so the instruction is effectively a 32-bit one. >> */ >> >> Which solution do you prefer? >> > > Hardcoding an execution size in the generator is not a real solution if > you're exposing execution sizes to the IR. I think the ultimate problem > here is that the execution type you deduce for this instruction while > doubling the exec size doesn't match the real execution type of the > instruction, because you pretend that it has a DF destination at the IR > level which is kind of a lie. > > This doesn't seem to be the only case where execution doubling may be a > problem BTW, you're likely to run into a similar issue with at least > VEC4_OPCODE_PICK_LOW_32BIT and VEC4_OPCODE_PICK_LOW_32BIT -- There may > be more. > > I think you could either fix the exec_size doubling logic to make it > aware of the real execution type of these single-precision instructions > that have fake DF types, use fake Q types at the IR level, or do > something more along the lines of this patch, but if you do the latter > you shouldn't be ignoring the incoming exec_size, and there are other > instructions you may have to fix case by case. >
I'm going to do the fix in the exec_size doubling logic. Thanks, Sam >> Sam >> >>>> dst = retype(dst, BRW_REGISTER_TYPE_UD); >>>> if (inst->opcode == VEC4_OPCODE_SET_HIGH_32BIT) >>>> @@ -2037,6 +2038,7 @@ generate_code(struct brw_codegen *p, >>>> src[0].hstride = BRW_HORIZONTAL_STRIDE_1; >>>> brw_MOV(p, dst, src[0]); >>>> >>>> + brw_set_default_exec_size(p, BRW_EXECUTE_8); >>>> brw_set_default_access_mode(p, BRW_ALIGN_16); >>>> break; >>>> } >>>> -- >>>> 2.11.0 >>>> >>>> _______________________________________________ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev