fs: Register allocator shoudn't use grf127 for sends dest

Caio Marcelo de Oliveira Filho Tue, 10 Jul 2018 16:09:01 -0700

> Since Gen8+ Intel PRM states that "r127 must not be used for return
> address when there is a src and dest overlap in send instruction."


The previous patch, that verifies the condition above is causing

    tests/spec/arb_compute_shader/linker/bug-93840.shader_test

to crash with

    shader_runner: ../src/intel/compiler/brw_fs_generator.cpp:2455: int 
fs_generator::generate_code(const cfg_t*, int): Assertion `validated' failed.

I also could reproduce the crash locally. It happens even with this
patch (which adds the hack) applied.


> This patch implements this restriction creating new grf127_send_hack_node
> at the register allocator. This node has a fixed assignation to grf127.
> 
> For vgrf that are used as destination of send messages we create node
> interfereces with the grf127_send_hack_node. So the register allocator
> will never assign to these vgrf a register that involves grf127.
> 
> If dispatch_width > 8 we don't create these interferences to the because
> all instructions have node interferences between sources and destination.
> That is enough to avoid the r127 restriction.

I think for both widths will not be enough. The instruction that fails
the validation is:

mov(8)          g126<1>UD       g0<8,8,1>UD                     { align1 WE_all 
1Q };
mov(1)          g126.2<1>UD     0x00000090UD                    { align1 WE_all 
1N };
send(16)        g126<1>UW       g126<8,8,1>UD
                            data ( DC OWORD block read, 253, 3) mlen 1 rlen 2 { 
align1 WE_all 1H };
        ERROR: r127 must not be used for return address when there is a src and 
dest overlap

Which if I understood correctly comes from the scratch reading being
created by the spilling logic. In brw_oword_block_read_scratch() we
see

   if (p->devinfo->gen >= 7) {
      /* On gen 7 and above, we no longer have message registers and we can
       * send from any register we want.  By using the destination register
       * for the message, we guarantee that the implied message write won't
       * accidentally overwrite anything.  This has been a problem because
       * the MRF registers and source for the final FB write are both fixed
       * and may overlap.
       */
      mrf = retype(dest, BRW_REGISTER_TYPE_UD);
   } else {
      mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
   }
   dest = retype(dest, BRW_REGISTER_TYPE_UW);

It seems to me we'll have to handle r127 there as well.


Thanks,
Caio



> 
> This fixes CTS tests that raised this issue as they were executed as SIMD8:
> 
> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
> 
> Shader-db results on Skylake:
>    total instructions in shared programs: 7686798 -> 7686797 (<.01%)
>    instructions in affected programs: 301 -> 300 (-0.33%)
>    helped: 1
>    HURT: 0
> 
>    total cycles in shared programs: 337092322 -> 337091919 (<.01%)
>    cycles in affected programs: 22420415 -> 22420012 (<.01%)
>    helped: 712
>    HURT: 588
> 
> Shader-db results on Broadwell:
> 
>    total instructions in shared programs: 7658574 -> 7658625 (<.01%)
>    instructions in affected programs: 19610 -> 19661 (0.26%)
>    helped: 3
>    HURT: 4
> 
>    total cycles in shared programs: 340694553 -> 340676378 (<.01%)
>    cycles in affected programs: 24724915 -> 24706740 (-0.07%)
>    helped: 998
>    HURT: 916
> 
>    total spills in shared programs: 4300 -> 4311 (0.26%)
>    spills in affected programs: 333 -> 344 (3.30%)
>    helped: 1
>    HURT: 3
> 
>    total fills in shared programs: 5370 -> 5378 (0.15%)
>    fills in affected programs: 274 -> 282 (2.92%)
>    helped: 1
>    HURT: 3
> 
> v2: Avoid duplicating register classes without grf127. Let's use a node
>     with a fixed assignation to grf127 and create interferences to send
>     message vgrf destinations. (Eric Anholt)
> v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
>     (Jose Maria Casanova)
> ---
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index ec8e116cb38..59e047483c0 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
>     int first_mrf_hack_node = node_count;
>     if (devinfo->gen >= 7)
>        node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
> +   int grf127_send_hack_node = node_count;
> +   if (devinfo->gen >= 8 && dispatch_width == 8)
> +      node_count ++;
>     struct ra_graph *g =
>        ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, 
> node_count);
>  
> @@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
>        }
>     }
>  
> +   if (devinfo->gen >= 8 && dispatch_width == 8) {
> +      /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
> +       * subsection "EUISA Instructions", Send Message (page 990):
> +       *
> +       * "r127 must not be used for return address when there is a src and
> +       * dest overlap in send instruction."
> +       *
> +       * We are avoiding using grf127 as part of the destination of send
> +       * messages adding a node interference to the grf127_send_hack_node.
> +       * This node has a fixed asignment to grf127.
> +       *
> +       * We don't apply it to SIMD16 because previous code avoids any 
> register
> +       * overlap between sources and destination.
> +       */
> +      ra_set_node_reg(g, grf127_send_hack_node, 127);
> +      foreach_block_and_inst(block, fs_inst, inst, cfg) {
> +         if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
> +            ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
> +         }
> +      }
> +   }
> +
>     /* Debug of register spilling: Go spill everything. */
>     if (unlikely(spill_all)) {
>        int reg = choose_spill_reg(g);
> -- 
> 2.17.1
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

Reply via email to