On 26/09/17 17:50, Kenneth Graunke wrote:
On Monday, September 25, 2017 6:23:13 PM PDT Timothy Arceri wrote:
For now linking is just removing unused varyings between stages.

shader-db results BDW:

total instructions in shared programs: 13198288 -> 13191693 (-0.05%)
instructions in affected programs: 48325 -> 41730 (-13.65%)
helped: 473
HURT: 0

total cycles in shared programs: 541184926 -> 541159260 (-0.00%)
cycles in affected programs: 213238 -> 187572 (-12.04%)
helped: 435
HURT: 8

V2:
- lower indirects on demoted inputs as well as outputs.
---
  src/mesa/drivers/dri/i965/brw_link.cpp | 56 ++++++++++++++++++++++++++++++++++
  1 file changed, 56 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
b/src/mesa/drivers/dri/i965/brw_link.cpp
index b7fab8d7a25..9ddf0230183 100644
--- a/src/mesa/drivers/dri/i965/brw_link.cpp
+++ b/src/mesa/drivers/dri/i965/brw_link.cpp
@@ -253,6 +253,62 @@ brw_link_shader(struct gl_context *ctx, struct 
gl_shader_program *shProg)
                                   compiler->scalar_stage[stage]);
     }
+ /* Determine first and last stage. */
+   unsigned first = MESA_SHADER_STAGES;
+   unsigned last = 0;
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+      if (!shProg->_LinkedShaders[i])
+         continue;
+      if (first == MESA_SHADER_STAGES)
+         first = i;
+      last = i;
+   }
+
+   /* Linking the stages in the opposite order (from fragment to vertex)
+    * ensures that inter-shader outputs written to in an earlier stage
+    * are eliminated if they are (transitively) not used in a later
+    * stage.
+    */
+    if (first != last) {
+       int next = last;
+       for (int i = next - 1; i >= 0; i--) {
+          if (shProg->_LinkedShaders[i] == NULL)
+             continue;
+
+            nir_shader *producer = shProg->_LinkedShaders[i]->Program->nir;
+            nir_shader *consumer = shProg->_LinkedShaders[next]->Program->nir;
+
+            nir_remove_dead_variables(producer, nir_var_shader_out);
+            nir_remove_dead_variables(consumer, nir_var_shader_in);
+
+            if (nir_remove_unused_varyings(producer, consumer)) {
+               nir_lower_global_vars_to_local(producer);
+               nir_lower_global_vars_to_local(consumer);
+
+               nir_variable_mode indirect_mask = (nir_variable_mode) 0;
+               if (compiler->glsl_compiler_options[i].EmitNoIndirectTemp)
+                  indirect_mask = (nir_variable_mode) nir_var_local;
+
+               /* The backend might not be able to handle indirects on
+                * temporaries so we need to lower indirects on any of the
+                * varyings we have demoted here.
+                */
+               nir_lower_indirect_derefs(producer, indirect_mask);
+               nir_lower_indirect_derefs(consumer, indirect_mask);
+
+               const bool p_is_scalar = 
compiler->scalar_stage[producer->stage];
+               shProg->_LinkedShaders[i]->Program->nir =
+                 brw_nir_optimize(producer, compiler, p_is_scalar);
+
+               const bool c_is_scalar = 
compiler->scalar_stage[producer->stage];
+               shProg->_LinkedShaders[next]->Program->nir =
+                 brw_nir_optimize(consumer, compiler, c_is_scalar);
+            }
+
+            next = i;
+       }
+    }
+
     for (stage = 0; stage < ARRAY_SIZE(shProg->_LinkedShaders); stage++) {
        struct gl_linked_shader *shader = shProg->_LinkedShaders[stage];
        if (!shader)

A couple more thoughts:

1. We're calling brw_nir_optimize even more now...it might make sense to
    drop it from brw_preprocess_nir, drop it here, and put it once in the
    final (third) loop, just before brw_shader_gather_info.  At least,
    it'd be interesting how that affects compile times / quality...
Yeah I can have a play around. As is it didn't really change shader-db 
compile times on my BDW so I wasn't too concerned.
2. It would be great to use this in anv as well, especially given that
    it has no GLSL IR linking to do this sort of stuff.  Plus, the SPIR-V
    might be compiled independently...and when you build the pipeline,
    you have all the stages and can do cross-stage linking optimizations
    like this.
Yeah the reason for doing this was to eventually add it to radv. i965 
was just a nicer place to test it, same goes for the passes that will 
follow. Right now the biggest blocked is justifying the added complexity 
as there isn't a standard shader-db like tool, and it's not making any 
noticeable differences in any benchmarks.
As far as anv I won't be doing any work on that directly, only radv.

Series looks great though, and is:
Reviewed-by: Kenneth Graunke <kenn...@whitecape.org>

Thanks Tim!

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to