On 04/24/2017 11:12 PM, Rob Clark wrote:
so I guess this is likely to hurt pipe drivers that don't (yet?) have
a real compiler backend.  (Ie. etnaviv and freedreno/a2xx.)  So maybe
it should be optional.

I can't say for these drivers, but it seems safer to add an option if this can hurt them. How about PIPE_SHADER_CAP_TGSI_MERGE_REGISTERS?


Also I wonder about the pre-llvm radeon gen's, since sb uses the
actual instruction encoding for IR between tgsi->sb and backend opt
passes..  iirc they have had problems when the tgsi code uses too many
registers.

BR,
-R

On Mon, Apr 24, 2017 at 5:01 PM, Samuel Pitoiset
<samuel.pitoi...@gmail.com> wrote:
The main goal of this pass to merge temporary registers in order
to reduce the total number of registers and also to produce
optimal TGSI code.

In fact, compilers seem to be confused when temporary variables
are already merged, maybe because it's done too early in the
process.

Removing the pass, reduce both the register pressure and the code
size (TGSI is no longer optimized, but who cares?).
shader-db results with RadeonSI and Nouveau are interesting.

Nouveau:

total instructions in shared programs : 3931608 -> 3929463 (-0.05%)
total gprs used in shared programs    : 481255 -> 479014 (-0.47%)
total local used in shared programs   : 27481 -> 27381 (-0.36%)
total bytes used in shared programs   : 36031256 -> 36011120 (-0.06%)

                 local        gpr       inst      bytes
     helped          14        1471        1309        1309
       hurt           1          88         384         384

RadeonSI:

PERCENTAGE DELTAS    Shaders     SGPRs     VGPRs SpillSGPR SpillVGPR  PrivVGPR  
 Scratch  CodeSize  MaxWaves     Waits
----------------------------------------------------------------------------------------------------------------------
All affected            4906   -0.31 %   -0.40 %   -2.93 %  -20.00 %     .      
-20.00 %   -0.18 %    0.19 %     .
----------------------------------------------------------------------------------------------------------------------
Total                  47109   -0.04 %   -0.05 %   -1.97 %   -7.14 %     .      
 -0.30 %   -0.03 %    0.02 %     .

Found by luck while fixing an issue in the TGSI dead code elimination
pass which affects tex instructions with bindless samplers.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 62 ------------------------------
  1 file changed, 62 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index de7fe7837a..d033bdcc5a 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -565,7 +565,6 @@ public:
     int eliminate_dead_code(void);

     void merge_two_dsts(void);
-   void merge_registers(void);
     void renumber_registers(void);

     void emit_block_mov(ir_assignment *ir, const struct glsl_type *type,
@@ -5262,66 +5261,6 @@ glsl_to_tgsi_visitor::merge_two_dsts(void)
     }
  }

-/* Merges temporary registers together where possible to reduce the number of
- * registers needed to run a program.
- *
- * Produces optimal code only after copy propagation and dead code elimination
- * have been run. */
-void
-glsl_to_tgsi_visitor::merge_registers(void)
-{
-   int *last_reads = rzalloc_array(mem_ctx, int, this->next_temp);
-   int *first_writes = rzalloc_array(mem_ctx, int, this->next_temp);
-   struct rename_reg_pair *renames = rzalloc_array(mem_ctx, struct 
rename_reg_pair, this->next_temp);
-   int i, j;
-   int num_renames = 0;
-
-   /* Read the indices of the last read and first write to each temp register
-    * into an array so that we don't have to traverse the instruction list as
-    * much. */
-   for (i = 0; i < this->next_temp; i++) {
-      last_reads[i] = -1;
-      first_writes[i] = -1;
-   }
-   get_last_temp_read_first_temp_write(last_reads, first_writes);
-
-   /* Start looking for registers with non-overlapping usages that can be
-    * merged together. */
-   for (i = 0; i < this->next_temp; i++) {
-      /* Don't touch unused registers. */
-      if (last_reads[i] < 0 || first_writes[i] < 0) continue;
-
-      for (j = 0; j < this->next_temp; j++) {
-         /* Don't touch unused registers. */
-         if (last_reads[j] < 0 || first_writes[j] < 0) continue;
-
-         /* We can merge the two registers if the first write to j is after or
-          * in the same instruction as the last read from i.  Note that the
-          * register at index i will always be used earlier or at the same time
-          * as the register at index j. */
-         if (first_writes[i] <= first_writes[j] &&
-             last_reads[i] <= first_writes[j]) {
-            renames[num_renames].old_reg = j;
-            renames[num_renames].new_reg = i;
-            num_renames++;
-
-            /* Update the first_writes and last_reads arrays with the new
-             * values for the merged register index, and mark the newly unused
-             * register index as such. */
-            assert(last_reads[j] >= last_reads[i]);
-            last_reads[i] = last_reads[j];
-            first_writes[j] = -1;
-            last_reads[j] = -1;
-         }
-      }
-   }
-
-   rename_temp_registers(num_renames, renames);
-   ralloc_free(renames);
-   ralloc_free(last_reads);
-   ralloc_free(first_writes);
-}
-
  /* Reassign indices to temporary registers by reusing unused indices created
   * by optimization passes. */
  void
@@ -6712,7 +6651,6 @@ get_mesa_program_tgsi(struct gl_context *ctx,
     while (v->eliminate_dead_code());

     v->merge_two_dsts();
-   v->merge_registers();
     v->renumber_registers();

     /* Write the END instruction. */
--
2.12.2

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to