On 08/09/17 02:50, Francisco Jerez wrote: > Currently the liveness analysis pass would extend a live interval up > to the top of the program when no unconditional and complete > definition of the variable is found that dominates all of its uses. > > This can lead to a serious performance problem in shaders containing > many partial writes, like scalar arithmetic, FP64 and soon FP16 > operations.
Just tested with the VK_KHR_16bit_storage implementation. Works really fine with the most problematic tests, so we can drop the "i965/fs: Add reuse_16bit_conversions_register optimization" patch (that was already NAKed by both you and Connor). My test was limited to that extension CTS tests, but in case it helps: Tested-by: Alejandro Piñeiro <apinhe...@igalia.com> > The number of oversize live intervals in such workloads > can cause the compilation time of the shader to explode because of the > worse than quadratic behavior of the register allocator and scheduler > when running out of registers, and it can also cause the running time > of the shader to explode due to the amount of spilling it leads to, > which is orders of magnitude slower than GRF memory. > > This patch fixes it by computing the intersection of our current live > intervals with the subset of the program that can possibly be reached > from any definition of the variable. Extending the storage allocation > of the variable beyond that is pretty useless because its value is > guaranteed to be undefined at a point that cannot be reached from any > definition. > > No significant change in the running time of shader-db (with 5% > statistical significance). > > shader-db results on IVB: > > total cycles in shared programs: 61108780 -> 60932856 (-0.29%) > cycles in affected programs: 16335482 -> 16159558 (-1.08%) > helped: 5121 > HURT: 4347 > > total spills in shared programs: 1309 -> 1288 (-1.60%) > spills in affected programs: 249 -> 228 (-8.43%) > helped: 3 > HURT: 0 > > total fills in shared programs: 1652 -> 1597 (-3.33%) > fills in affected programs: 262 -> 207 (-20.99%) > helped: 4 > HURT: 0 > > LOST: 2 > GAINED: 209 > > shader-db results on BDW: > > total cycles in shared programs: 67617262 -> 67361220 (-0.38%) > cycles in affected programs: 23397142 -> 23141100 (-1.09%) > helped: 8045 > HURT: 6488 > > total spills in shared programs: 1456 -> 1252 (-14.01%) > spills in affected programs: 465 -> 261 (-43.87%) > helped: 3 > HURT: 0 > > total fills in shared programs: 1720 -> 1465 (-14.83%) > fills in affected programs: 471 -> 216 (-54.14%) > helped: 4 > HURT: 0 > > LOST: 2 > GAINED: 162 > > shader-db results on SKL: > > total cycles in shared programs: 65436248 -> 65245186 (-0.29%) > cycles in affected programs: 22560936 -> 22369874 (-0.85%) > helped: 8457 > HURT: 6247 > > total spills in shared programs: 437 -> 437 (0.00%) > spills in affected programs: 0 -> 0 > helped: 0 > HURT: 0 > > total fills in shared programs: 870 -> 854 (-1.84%) > fills in affected programs: 16 -> 0 > helped: 1 > HURT: 0 > > LOST: 0 > GAINED: 107 > --- > src/intel/compiler/brw_fs_live_variables.cpp | 34 > ++++++++++++++++++++++++---- > src/intel/compiler/brw_fs_live_variables.h | 12 ++++++++++ > 2 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp > b/src/intel/compiler/brw_fs_live_variables.cpp > index c449672a519..059f076fa51 100644 > --- a/src/intel/compiler/brw_fs_live_variables.cpp > +++ b/src/intel/compiler/brw_fs_live_variables.cpp > @@ -83,9 +83,11 @@ fs_live_variables::setup_one_write(struct block_data *bd, > fs_inst *inst, > /* The def[] bitset marks when an initialization in a block completely > * screens off previous updates of that variable (VGRF channel). > */ > - if (inst->dst.file == VGRF && !inst->is_partial_write()) { > - if (!BITSET_TEST(bd->use, var)) > + if (inst->dst.file == VGRF) { > + if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var)) > BITSET_SET(bd->def, var); > + > + BITSET_SET(bd->defout, var); > } > } > > @@ -199,6 +201,28 @@ fs_live_variables::compute_live_variables() > } > } > } > + > + /* Propagate defin and defout down the CFG to calculate the union of live > + * variables potentially defined along any possible control flow path. > + */ > + do { > + cont = false; > + > + foreach_block (block, cfg) { > + const struct block_data *bd = &block_data[block->num]; > + > + foreach_list_typed(bblock_link, child_link, link, &block->children) { > + struct block_data *child_bd = > &block_data[child_link->block->num]; > + > + for (int i = 0; i < bitset_words; i++) { > + const BITSET_WORD new_def = bd->defout[i] & > ~child_bd->defin[i]; > + child_bd->defin[i] |= new_def; > + child_bd->defout[i] |= new_def; > + cont |= new_def; > + } > + } > + } > + } while (cont); > } > > /** > @@ -212,12 +236,12 @@ fs_live_variables::compute_start_end() > struct block_data *bd = &block_data[block->num]; > > for (int i = 0; i < num_vars; i++) { > - if (BITSET_TEST(bd->livein, i)) { > + if (BITSET_TEST(bd->livein, i) && BITSET_TEST(bd->defin, i)) { > start[i] = MIN2(start[i], block->start_ip); > end[i] = MAX2(end[i], block->start_ip); > } > > - if (BITSET_TEST(bd->liveout, i)) { > + if (BITSET_TEST(bd->liveout, i) && BITSET_TEST(bd->defout, i)) { > start[i] = MIN2(start[i], block->end_ip); > end[i] = MAX2(end[i], block->end_ip); > } > @@ -260,6 +284,8 @@ fs_live_variables::fs_live_variables(fs_visitor *v, const > cfg_t *cfg) > block_data[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); > block_data[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, > bitset_words); > block_data[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, > bitset_words); > + block_data[i].defin = rzalloc_array(mem_ctx, BITSET_WORD, > bitset_words); > + block_data[i].defout = rzalloc_array(mem_ctx, BITSET_WORD, > bitset_words); > > block_data[i].flag_def[0] = 0; > block_data[i].flag_use[0] = 0; > diff --git a/src/intel/compiler/brw_fs_live_variables.h > b/src/intel/compiler/brw_fs_live_variables.h > index d2d5898ed1c..9e95e443170 100644 > --- a/src/intel/compiler/brw_fs_live_variables.h > +++ b/src/intel/compiler/brw_fs_live_variables.h > @@ -55,6 +55,18 @@ struct block_data { > /** Which defs reach the exit point of the block. */ > BITSET_WORD *liveout; > > + /** > + * Variables such that the entry point of the block may be reached from > any > + * of their definitions. > + */ > + BITSET_WORD *defin; > + > + /** > + * Variables such that the exit point of the block may be reached from any > + * of their definitions. > + */ > + BITSET_WORD *defout; > + > BITSET_WORD flag_def[1]; > BITSET_WORD flag_use[1]; > BITSET_WORD flag_livein[1]; _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev