Am 24.04.2017 um 23:12 schrieb Rob Clark: > so I guess this is likely to hurt pipe drivers that don't (yet?) > have a real compiler backend. (Ie. etnaviv and freedreno/a2xx.) So > maybe it should be optional. I suppose softpipe, too? Though that's fine, noone cares if it gets a bit slower. Might even be nicer for debugging purposes...
Roland > Also I wonder about the pre-llvm radeon gen's, since sb uses the > actual instruction encoding for IR between tgsi->sb and backend opt > passes.. iirc they have had problems when the tgsi code uses too > many registers. > > BR, -R > > On Mon, Apr 24, 2017 at 5:01 PM, Samuel Pitoiset > <samuel.pitoi...@gmail.com> wrote: >> The main goal of this pass to merge temporary registers in order to >> reduce the total number of registers and also to produce optimal >> TGSI code. >> >> In fact, compilers seem to be confused when temporary variables are >> already merged, maybe because it's done too early in the process. >> >> Removing the pass, reduce both the register pressure and the code >> size (TGSI is no longer optimized, but who cares?). shader-db >> results with RadeonSI and Nouveau are interesting. >> >> Nouveau: >> >> total instructions in shared programs : 3931608 -> 3929463 >> (-0.05%) total gprs used in shared programs : 481255 -> 479014 >> (-0.47%) total local used in shared programs : 27481 -> 27381 >> (-0.36%) total bytes used in shared programs : 36031256 -> >> 36011120 (-0.06%) >> >> local gpr inst bytes helped 14 >> 1471 1309 1309 hurt 1 88 >> 384 384 >> >> RadeonSI: >> >> PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR >> SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits >> ---------------------------------------------------------------------------------------------------------------------- >> >> All affected 4906 -0.31 % -0.40 % -2.93 % -20.00 % . -20.00 % -0.18 % 0.19 % . >> ---------------------------------------------------------------------------------------------------------------------- >> >> Total 47109 -0.04 % -0.05 % -1.97 % -7.14 % . -0.30 % -0.03 % 0.02 % . >> >> Found by luck while fixing an issue in the TGSI dead code >> elimination pass which affects tex instructions with bindless >> samplers. >> >> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> --- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 62 >> ------------------------------ 1 file changed, 62 deletions(-) >> >> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index >> de7fe7837a..d033bdcc5a 100644 --- >> a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ >> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -565,7 +565,6 @@ >> public: int eliminate_dead_code(void); >> >> void merge_two_dsts(void); - void merge_registers(void); void >> renumber_registers(void); >> >> void emit_block_mov(ir_assignment *ir, const struct glsl_type >> *type, @@ -5262,66 +5261,6 @@ >> glsl_to_tgsi_visitor::merge_two_dsts(void) } } >> >> -/* Merges temporary registers together where possible to reduce >> the number of - * registers needed to run a program. - * - * >> Produces optimal code only after copy propagation and dead code >> elimination - * have been run. */ -void >> -glsl_to_tgsi_visitor::merge_registers(void) -{ - int *last_reads >> = rzalloc_array(mem_ctx, int, this->next_temp); - int >> *first_writes = rzalloc_array(mem_ctx, int, this->next_temp); - >> struct rename_reg_pair *renames = rzalloc_array(mem_ctx, struct >> rename_reg_pair, this->next_temp); - int i, j; - int >> num_renames = 0; - - /* Read the indices of the last read and >> first write to each temp register - * into an array so that we >> don't have to traverse the instruction list as - * much. */ - >> for (i = 0; i < this->next_temp; i++) { - last_reads[i] = -1; >> - first_writes[i] = -1; - } - >> get_last_temp_read_first_temp_write(last_reads, first_writes); - - >> /* Start looking for registers with non-overlapping usages that can >> be - * merged together. */ - for (i = 0; i < this->next_temp; >> i++) { - /* Don't touch unused registers. */ - if >> (last_reads[i] < 0 || first_writes[i] < 0) continue; - - for >> (j = 0; j < this->next_temp; j++) { - /* Don't touch unused >> registers. */ - if (last_reads[j] < 0 || first_writes[j] < >> 0) continue; - - /* We can merge the two registers if the >> first write to j is after or - * in the same instruction >> as the last read from i. Note that the - * register at >> index i will always be used earlier or at the same time - >> * as the register at index j. */ - if (first_writes[i] <= >> first_writes[j] && - last_reads[i] <= first_writes[j]) >> { - renames[num_renames].old_reg = j; - >> renames[num_renames].new_reg = i; - num_renames++; - - >> /* Update the first_writes and last_reads arrays with the new - >> * values for the merged register index, and mark the newly unused - >> * register index as such. */ - assert(last_reads[j] >= >> last_reads[i]); - last_reads[i] = last_reads[j]; - >> first_writes[j] = -1; - last_reads[j] = -1; - } >> - } - } - - rename_temp_registers(num_renames, renames); - >> ralloc_free(renames); - ralloc_free(last_reads); - >> ralloc_free(first_writes); -} - /* Reassign indices to temporary >> registers by reusing unused indices created * by optimization >> passes. */ void @@ -6712,7 +6651,6 @@ get_mesa_program_tgsi(struct >> gl_context *ctx, while (v->eliminate_dead_code()); >> >> v->merge_two_dsts(); - v->merge_registers(); >> v->renumber_registers(); >> >> /* Write the END instruction. */ -- 2.12.2 >> >> _______________________________________________ mesa-dev mailing >> list mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing > list mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev