"Juan A. Suarez Romero" <jasua...@igalia.com> writes:

> On Mon, 2017-01-09 at 15:41 -0800, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:
>> 
>> > From: "Juan A. Suarez Romero" <jasua...@igalia.com>
>> > 
>> > In IVB/VLV, for instructions dealing with DF, execsize will be
>> > duplicated in the final code.
>> > 
>> > So take this in account when checking if instructions should be
>> > split.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ++++++++++++-
>> >  1 file changed, 12 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 78f2124..cfce364 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -4551,8 +4551,15 @@ static unsigned
>> >  get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
>> >                             const fs_inst *inst)
>> >  {
>> > +   /* Note that in IVB/VLV for instructions that handles DF, we
>> > will duplicate
>> > +    * the exec_size. So take this value for calculus purposes.
>> > +    */
>> > +   unsigned exec_size = inst->exec_size;
>> > +   if (devinfo->gen == 7 && !devinfo->is_haswell && inst-
>> > >exec_data_size() == 8)
>> > +      exec_size *= 2;
>> > +
>> 
>> NAK.  This won't have any useful effect because the max_width value
>> doesn't actually have any influence in the regioning restriction
>> calculations below, it's only used to accumulate the intermediate
>> execution size limits derived from each hardware restriction.  I
>> don't
>> think you need to change any of the restrictions below for the result
>> to
>> be correct in presence of exec size doubling.
>
>
>
> Yes, you're right.
>
> Actually the motivation for the patch was to avoid ending up with a
> final exec_size > 8  (after doubling it in the generator) when we are
> generating a SIMD8 shader (or exec_size > 16 in SIMD16 shader).
>
> So if we have a exec_size == 8, and we are generating SIMD8 code, then
> we want to split in two instructions of exec_size==4, because at the
> end we will double the exec_size to 8. Same reasoning with SIMD16 code.
>
You're fine with such execution sizes as long asyou don't violate the 2
GRFs per source or destination region hardware limit (and a few other
crazy restrictions) which are only affected by the *effective* execution
size of the instruction and don't care at all about the units the
execution size is expressed in at the machine code level (e.g. the total
amount of data accessed by a source region of the instruction is the
same regardless of whether you consider it to be a sequence of N
elements of size S or a sequence of 2*N elements of size S/2).

>
> This could be done as:
>
> @@ -5119,7 +5119,11 @@ fs_visitor::lower_simd_width()
>     bool progress = false;
>  
>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
> -      const unsigned lower_width = get_lowered_simd_width(devinfo,
> inst);
> +      unsigned lower_width = get_lowered_simd_width(devinfo, inst);
> +
> +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> +          (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) ==
> 8))
> +         lower_width = MIN2(lower_width, dispatch_width / 2);
>

That wouldn't do any good...  Let's just drop this patch.

>
> Now, this is all in theory. So far, no test has exposed this
> situation with this change of top of our branch, and the lower_width
> returned by get_lowered_simd_width() is always less than 8. But maybe
> this could happen in future, for instance, if we emit an instruction
> that writes more than 4 DF and has force_writemask_all == TRUE.
>
>> 
>> >     /* Maximum execution size representable in the instruction controls. */
>> > -   unsigned max_width = MIN2(32, inst->exec_size);
>> > +   unsigned max_width = MIN2(32, exec_size);
>> >  
>> >     /* According to the PRMs:
>> >      *  "A. In Direct Addressing mode, a source cannot span more than 2
>> > @@ -4656,6 +4663,10 @@ get_fpu_lowered_simd_width(const struct 
>> > gen_device_info *devinfo,
>> >           max_width = MIN2(max_width, channels_per_grf);
>> >     }
>> >  
>> > +   /* If we have duplicated exec_size, then readjust max_width if 
>> > required. */
>> > +   if (exec_size != inst->exec_size && max_width == exec_size)
>> > +      max_width = inst->exec_size;
>> > +
>> >     /* Only power-of-two execution sizes are representable in the 
>> > instruction
>> >      * control fields.
>> >      */
>> > -- 
>> > 2.9.3
>> > 
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> 
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to