Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:

> From: "Juan A. Suarez Romero" <jasua...@igalia.com>
>
> Previous to Broadwell, we have 8 registers for MOV_INDIRECT. But if
> IVB/VLV deal with DFs, we will duplicate the exec_size from 8 to 16.
>
> This patch limits the SIMD width to 4 in this case.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index cfce364..45d320d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4959,8 +4959,13 @@ get_lowered_simd_width(const struct gen_device_info 
> *devinfo,
>        return MIN2(8, inst->exec_size);
>  
>     case SHADER_OPCODE_MOV_INDIRECT:
> -      /* Prior to Broadwell, we only have 8 address subregisters */
> -      return MIN3(devinfo->gen >= 8 ? 16 : 8,
> +      /* Prior to Broadwell, we only have 8 address subregisters. Special 
> case
> +       * for IVB/VLV and DF types: set to 4 (exec_size will be later
> +       * duplicated).

The comment seems rather misleading, exec size doubling is unlikely to
have anything to do with this problem.

> +       */
> +      return MIN3(devinfo->gen >= 8 ? 16 : ((devinfo->gen == 7 &&
> +                                             !devinfo->is_haswell &&
> +                                             inst->exec_data_size() == 8) ? 
> 4 : 8),
>                    2 * REG_SIZE / (inst->dst.stride * 
> type_sz(inst->dst.type)),
>                    inst->exec_size);

I'm amazed that this works at all on HSW, according to the IVB and HSW
PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode. In addition, the
 sources must be assembled from GRF registers each accessed by adjacent
 index registers in 1x1 regioning modes."

So for DF instructions the execution size is not limited by the number
of address registers you have available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

I think this should be something along the lines of:

|   const unsigned max_size = (devinfo->gen >= 8 ? 2 : 1) * REG_SIZE;
|   return MIN3(devinfo->gen >= 8 ? 16 : 8,
|               max_size / (inst->dst.stride * type_sz(inst->dst.type)),
|               inst->exec_size);

>  
> -- 
> 2.9.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to