Re: [Mesa-dev] [PATCH v3 16/24] i965/vec4: fix SIMD-width lowering for VEC4_OPCODE_FROM_DOUBLE in IVB/BYT

Francisco Jerez Tue, 07 Mar 2017 13:32:36 -0800

Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:

> On 04/03/17 01:04, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:
>> 
>>> From: "Juan A. Suarez Romero" <jasua...@igalia.com>
>>>
>>> When splitting VEC4_OPCODE_FROM_DOUBLE in Ivybridge/Baytrail, the second
>>> part should use a temporal register, and then move the values to the
>>> second half of the original destination, so we get all the results in the
>>> same register.
>>>
>>> v2:
>>> - Fix typos (Matt).
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp           | 17 +++++++++++++----
>>>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  1 +
>>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 64b435f3ec4..adcde085305 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -2191,9 +2191,15 @@ vec4_visitor::lower_simd_width()
>>>           linst->group = channel_offset;
>>>           linst->size_written = size_written;
>>>  
>>> +         /* When splitting VEC4_OPCODE_FROM_DOUBLE on Ivybridge, the 
>>> second part
>>> +          * should use in a temporal register. Later we will move the 
>>> values
>>> +          * to the second half of the original destination, so we get all 
>>> the
>>> +          * results in the same register. We use d2f_pass to detect this 
>>> case.
>>> +          */
>>> +         bool d2f_pass = (inst->opcode == VEC4_OPCODE_FROM_DOUBLE && n > 
>>> 0);
>>>           /* Compute split dst region */
>>>           dst_reg dst;
>>> -         if (needs_temp) {
>>> +         if (needs_temp || d2f_pass) {
>>>              unsigned num_regs = DIV_ROUND_UP(size_written, REG_SIZE);
>>>              dst = retype(dst_reg(VGRF, alloc.allocate(num_regs)),
>>>                           inst->dst.type);
>>> @@ -2226,9 +2232,12 @@ vec4_visitor::lower_simd_width()
>>>           /* If we used a temporary to store the result of the split
>>>            * instruction, copy the result to the original destination
>>>            */
>>> -         if (needs_temp) {
>>> -            vec4_instruction *mov =
>>> -               MOV(offset(inst->dst, lowered_width, n), src_reg(dst));
>>> +         if (needs_temp || d2f_pass) {
>>> +            vec4_instruction *mov;
>>> +            if (d2f_pass)
>>> +               mov = MOV(horiz_offset(inst->dst, n * 
>>> type_sz(inst->dst.type)), src_reg(dst));
>> 
>> I have no idea how this could possibly work...  horiz_offset() expects a
>> number of scalar components, not bytes.  Anyway I have a hunch this is
>> trying to workaround the bug I pointed out in PATCH 15...
>> 
>
> This worked by luck. We want an horizontal offset of 'lowered_width'
> components per split. As inst->dst.type is F, type_sz(inst->dst.type) =
> 4 and we set the maximum 'lowered_width' value to 4, it matches in the
> cases we tested... however, this is a bug.
>
> Actually it works too if we use offset() instead, so this hunk is not
> needed.
>


This patch as a whole makes no sense to me...  It's working around the
dubious behavior of VEC4_OPCODE_FROM_DOUBLE which doesn't behave like
any other arithmetic instruction of the back-end because it corrupts
data beyond its destination region, which means any reasoning done about
it within the optimizer is suspect of leading to data corruption in the
same way that you got the SIMD lowering pass to inadvertently cause
corruption.  The problem IMO is that you're pretending that something
that doesn't behave like a back-end instruction is an instruction.  I
suggest you split the datatype conversion from the gathering/scattering
of 32-bit fields as separate instructions -- Turns out that you already
have the latter implemented as VEC4_OPCODE_PICK/SET_LOW_32BIT.

Incidentally doing things that way would allow better performance
because the two operations would become visible to the scheduler and an
EU pipeline stall could potentially be avoided (though performance is by
no means my primary motivation for NAKing this patch, but rather
preserving our sanity in the long run...).

> Sam
>
>>> +            else
>>> +               mov = MOV(offset(inst->dst, lowered_width, n), 
>>> src_reg(dst));
>>>              mov->exec_size = lowered_width;
>>>              mov->group = channel_offset;
>>>              mov->size_written = size_written;
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>> index 7fa1afc9073..b570792badd 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
>>> @@ -1532,6 +1532,7 @@ generate_code(struct brw_codegen *p,
>>>               is_ivb_df);
>>>  
>>>        assert(inst->group % 8 == 0 ||
>>> +             (inst->exec_size == 4 && inst->group % 4 == 0) ||
>>>               inst->dst.type == BRW_REGISTER_TYPE_DF ||
>>>               inst->src[0].type == BRW_REGISTER_TYPE_DF ||
>>>               inst->src[1].type == BRW_REGISTER_TYPE_DF ||
>>> -- 
>>> 2.11.0
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 16/24] i965/vec4: fix SIMD-width lowering for VEC4_OPCODE_FROM_DOUBLE in IVB/BYT

Reply via email to