Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > On 04/03/17 01:04, Francisco Jerez wrote: >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >> >>> From: "Juan A. Suarez Romero" <jasua...@igalia.com> >>> >>> When splitting VEC4_OPCODE_FROM_DOUBLE in Ivybridge/Baytrail, the second >>> part should use a temporal register, and then move the values to the >>> second half of the original destination, so we get all the results in the >>> same register. >>> >>> v2: >>> - Fix typos (Matt). >>> --- >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 +++++++++++++---- >>> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 1 + >>> 2 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index 64b435f3ec4..adcde085305 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -2191,9 +2191,15 @@ vec4_visitor::lower_simd_width() >>> linst->group = channel_offset; >>> linst->size_written = size_written; >>> >>> + /* When splitting VEC4_OPCODE_FROM_DOUBLE on Ivybridge, the >>> second part >>> + * should use in a temporal register. Later we will move the >>> values >>> + * to the second half of the original destination, so we get all >>> the >>> + * results in the same register. We use d2f_pass to detect this >>> case. >>> + */ >>> + bool d2f_pass = (inst->opcode == VEC4_OPCODE_FROM_DOUBLE && n > >>> 0); >>> /* Compute split dst region */ >>> dst_reg dst; >>> - if (needs_temp) { >>> + if (needs_temp || d2f_pass) { >>> unsigned num_regs = DIV_ROUND_UP(size_written, REG_SIZE); >>> dst = retype(dst_reg(VGRF, alloc.allocate(num_regs)), >>> inst->dst.type); >>> @@ -2226,9 +2232,12 @@ vec4_visitor::lower_simd_width() >>> /* If we used a temporary to store the result of the split >>> * instruction, copy the result to the original destination >>> */ >>> - if (needs_temp) { >>> - vec4_instruction *mov = >>> - MOV(offset(inst->dst, lowered_width, n), src_reg(dst)); >>> + if (needs_temp || d2f_pass) { >>> + vec4_instruction *mov; >>> + if (d2f_pass) >>> + mov = MOV(horiz_offset(inst->dst, n * >>> type_sz(inst->dst.type)), src_reg(dst)); >> >> I have no idea how this could possibly work... horiz_offset() expects a >> number of scalar components, not bytes. Anyway I have a hunch this is >> trying to workaround the bug I pointed out in PATCH 15... >> > > This worked by luck. We want an horizontal offset of 'lowered_width' > components per split. As inst->dst.type is F, type_sz(inst->dst.type) = > 4 and we set the maximum 'lowered_width' value to 4, it matches in the > cases we tested... however, this is a bug. > > Actually it works too if we use offset() instead, so this hunk is not > needed. >
This patch as a whole makes no sense to me... It's working around the dubious behavior of VEC4_OPCODE_FROM_DOUBLE which doesn't behave like any other arithmetic instruction of the back-end because it corrupts data beyond its destination region, which means any reasoning done about it within the optimizer is suspect of leading to data corruption in the same way that you got the SIMD lowering pass to inadvertently cause corruption. The problem IMO is that you're pretending that something that doesn't behave like a back-end instruction is an instruction. I suggest you split the datatype conversion from the gathering/scattering of 32-bit fields as separate instructions -- Turns out that you already have the latter implemented as VEC4_OPCODE_PICK/SET_LOW_32BIT. Incidentally doing things that way would allow better performance because the two operations would become visible to the scheduler and an EU pipeline stall could potentially be avoided (though performance is by no means my primary motivation for NAKing this patch, but rather preserving our sanity in the long run...). > Sam > >>> + else >>> + mov = MOV(offset(inst->dst, lowered_width, n), >>> src_reg(dst)); >>> mov->exec_size = lowered_width; >>> mov->group = channel_offset; >>> mov->size_written = size_written; >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>> index 7fa1afc9073..b570792badd 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >>> @@ -1532,6 +1532,7 @@ generate_code(struct brw_codegen *p, >>> is_ivb_df); >>> >>> assert(inst->group % 8 == 0 || >>> + (inst->exec_size == 4 && inst->group % 4 == 0) || >>> inst->dst.type == BRW_REGISTER_TYPE_DF || >>> inst->src[0].type == BRW_REGISTER_TYPE_DF || >>> inst->src[1].type == BRW_REGISTER_TYPE_DF || >>> -- >>> 2.11.0 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev