On Wed, Dec 17, 2014 at 2:23 PM, Connor Abbott <cwabbo...@gmail.com> wrote: > > On Wed, Dec 17, 2014 at 5:11 PM, Jason Ekstrand <ja...@jlekstrand.net> > wrote: > > On Wed, Dec 17, 2014 at 1:52 PM, Connor Abbott <cwabbo...@gmail.com> > wrote: > >> > >> I'm sure you're already aware, but there are two things we could do to > >> speed this up: > >> > >> 1. Pre-compute def/kill sets for each block similar to what i965 does. > > > > > > Sure, but we walk the instructions at most deepest block depth + 1 and > the > > depest we've ever seen in the wild is 2. > > Well, there's always orbital explorer ;) > > > > >> 2. Use a worklist + an array of flags for "this block is in the > >> worklist" rather than walking all the basic blocks in reverse to find > >> the few we need to update. > > > > > > Sure, we could, but I don't see how pushing the blocks onto a stack and > then > > popping them back off really gains us anything over just walking them. > If > > there's something I'm missing, please let me know because it's not > jumping > > out at me. I'll freely admit that I'm not terribly familiar with > worklists. > > I didn't mean a stack, I meant a FIFO queue. The idea is that rather > than walking through every instruction, even when there are only > changes to be made, we pop a pointer to a block off the worklist, see > if it's live-in set changed, and if it did, we push every predecessor > not already in the worklist onto the worklist and then mark the block > as not being in the worklist. We start by pushing all the blocks in > the worklist in reverse order, since that's the most efficient > ordering. Then, we're only processing blocks that we know might change > rather than every possible one, and for control flow without loops we > only walk the instructions once instead of twice. I think this is > going to be a pretty common thing for optimization passes to do, so it > would be nice to have a common implementation of the worklist. >
Reading through that a second time and it makes more sense now. Sure, we could do it that way and it is more efficient. However, that's also a pretty solid rewrite so I think it'll have to wait until more pressing things get done. --Jason > > > > >> > >> > >> Wrt #2, we already use a worklist in the DCE pass, but it's kinda lame > >> because it's using a linked list when we could just allocate an array > >> of pointers up-front based on the maximum size (the number of blocks > >> in this case, the number of SSA definitions in that case) and use it > >> as a ringbuffer. It would be a nice cleanup to implement such a > >> bounded worklist and share it between these two passes, since we'll > >> probably want to use it for lots of other passes too. > >> > >> I don't either thing should block merging this, though. > >> > >> A few other comments below. > >> > >> On Tue, Dec 16, 2014 at 1:05 AM, Jason Ekstrand <ja...@jlekstrand.net> > >> wrote: > >> > --- > >> > src/glsl/Makefile.sources | 1 + > >> > src/glsl/nir/nir.h | 13 ++ > >> > src/glsl/nir/nir_live_variables.c | 282 > >> > ++++++++++++++++++++++++++++++++++++++ > >> > src/glsl/nir/nir_metadata.c | 2 + > >> > src/mesa/main/bitset.h | 1 + > >> > 5 files changed, 299 insertions(+) > >> > create mode 100644 src/glsl/nir/nir_live_variables.c > >> > > >> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources > >> > index 4eb6320..433224e 100644 > >> > --- a/src/glsl/Makefile.sources > >> > +++ b/src/glsl/Makefile.sources > >> > @@ -20,6 +20,7 @@ NIR_FILES = \ > >> > $(GLSL_SRCDIR)/nir/nir_from_ssa.c \ > >> > $(GLSL_SRCDIR)/nir/nir_intrinsics.c \ > >> > $(GLSL_SRCDIR)/nir/nir_intrinsics.h \ > >> > + $(GLSL_SRCDIR)/nir/nir_live_variables.c \ > >> > $(GLSL_SRCDIR)/nir/nir_lower_atomics.c \ > >> > $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \ > >> > $(GLSL_SRCDIR)/nir/nir_lower_system_values.c \ > >> > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h > >> > index f744736..2f2edb6 100644 > >> > --- a/src/glsl/nir/nir.h > >> > +++ b/src/glsl/nir/nir.h > >> > @@ -420,6 +420,9 @@ typedef struct { > >> > /** generic SSA definition index. */ > >> > unsigned index; > >> > > >> > + /** Index into the live_in and live_out bitfields */ > >> > + unsigned live_index; > >> > + > >> > nir_instr *parent_instr; > >> > > >> > struct set *uses; > >> > @@ -999,6 +1002,10 @@ typedef struct nir_block { > >> > struct nir_block **dom_children; > >> > > >> > struct set *dom_frontier; > >> > + > >> > + /* live in and out for this block; used for liveness analysis */ > >> > + BITSET_WORD *live_in; > >> > + BITSET_WORD *live_out; > >> > } nir_block; > >> > > >> > #define nir_block_first_instr(block) \ > >> > @@ -1047,6 +1054,7 @@ typedef enum { > >> > nir_metadata_none = 0x0, > >> > nir_metadata_block_index = 0x1, > >> > nir_metadata_dominance = 0x2, > >> > + nir_metadata_live_variables = 0x4, > >> > } nir_metadata; > >> > > >> > typedef struct { > >> > @@ -1274,6 +1282,8 @@ bool nir_foreach_src(nir_instr *instr, > >> > nir_foreach_src_cb cb, void *state); > >> > typedef bool (*nir_foreach_block_cb)(nir_block *block, void *state); > >> > bool nir_foreach_block(nir_function_impl *impl, nir_foreach_block_cb > >> > cb, > >> > void *state); > >> > +bool nir_foreach_block_reverse(nir_function_impl *impl, > >> > nir_foreach_block_cb cb, > >> > + void *state); > >> > >> This hunk should go in the previous patch that defined this function. > >> > >> > > >> > /* If the following CF node is an if, this function returns that if. > >> > * Otherwise, it returns NULL. > >> > @@ -1318,6 +1328,9 @@ void nir_lower_system_values(nir_shader > *shader); > >> > > >> > void nir_lower_atomics(nir_shader *shader); > >> > > >> > +void nir_live_variables_impl(nir_function_impl *impl); > >> > +bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b); > >> > + > >> > void nir_convert_to_ssa_impl(nir_function_impl *impl); > >> > void nir_convert_to_ssa(nir_shader *shader); > >> > void nir_convert_from_ssa(nir_shader *shader); > >> > diff --git a/src/glsl/nir/nir_live_variables.c > >> > b/src/glsl/nir/nir_live_variables.c > >> > new file mode 100644 > >> > index 0000000..7d99a06 > >> > --- /dev/null > >> > +++ b/src/glsl/nir/nir_live_variables.c > >> > @@ -0,0 +1,282 @@ > >> > +/* > >> > + * Copyright © 2014 Intel Corporation > >> > + * > >> > + * Permission is hereby granted, free of charge, to any person > >> > obtaining a > >> > + * copy of this software and associated documentation files (the > >> > "Software"), > >> > + * to deal in the Software without restriction, including without > >> > limitation > >> > + * the rights to use, copy, modify, merge, publish, distribute, > >> > sublicense, > >> > + * and/or sell copies of the Software, and to permit persons to whom > >> > the > >> > + * Software is furnished to do so, subject to the following > conditions: > >> > + * > >> > + * The above copyright notice and this permission notice (including > the > >> > next > >> > + * paragraph) shall be included in all copies or substantial portions > >> > of the > >> > + * Software. > >> > + * > >> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > >> > EXPRESS OR > >> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > >> > MERCHANTABILITY, > >> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT > >> > SHALL > >> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES > OR > >> > OTHER > >> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > >> > ARISING > >> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > OTHER > >> > DEALINGS > >> > + * IN THE SOFTWARE. > >> > + * > >> > + * Authors: > >> > + * Jason Ekstrand (ja...@jlekstrand.net) > >> > + */ > >> > + > >> > +#include "nir.h" > >> > + > >> > +/* > >> > + * Basic liveness analysis. This works only in SSA form. > >> > + * > >> > + * This liveness pass treats phi nodes as being melded to the space > >> > between > >> > + * blocks so that the destinations of a phi are in the livein of the > >> > block > >> > + * in which it resides and the sources are in the liveout of the > >> > + * corresponding block. By formulating the liveness information in > >> > this > >> > + * way, we ensure that the definition of any variable dominates its > >> > entire > >> > + * live range. This is true because the only way that the definition > >> > of an > >> > + * SSA value may not dominate a use is if the use is in a phi node > and > >> > the > >> > + * uses in phi no are in the live-out of the corresponding > predecessor > >> > + * block but not in the live-in of the block containing the phi node. > >> > + */ > >> > + > >> > +struct live_variables_state { > >> > + unsigned num_ssa_defs; > >> > + unsigned bitset_words; > >> > + BITSET_WORD progress; > >> > >> I'd rather have this be > >> > >> bool progress; > >> > >> since it's effectively being used as a boolean, no need to confuse > >> people with this and it's just unnecessary micro-optimization. > > > > > > Sure. That can be done. > > --Jason > > > >> > >> > >> > +}; > >> > + > >> > +static bool > >> > +index_dest(nir_dest *dest, void *void_state) > >> > +{ > >> > + struct live_variables_state *state = void_state; > >> > + > >> > + if (dest->is_ssa) > >> > + dest->ssa.live_index = state->num_ssa_defs++; > >> > + > >> > + return true; > >> > +} > >> > + > >> > +static bool > >> > +index_ssa_definitions_block(nir_block *block, void *void_state) > >> > +{ > >> > + struct live_variables_state *state = void_state; > >> > + > >> > + nir_foreach_instr(block, instr) { > >> > + if (instr->type == nir_instr_type_ssa_undef) { > >> > + nir_ssa_undef_instr *undef = nir_instr_as_ssa_undef(instr); > >> > + undef->def.live_index = 0; > >> > + } else { > >> > + nir_foreach_dest(instr, index_dest, state); > >> > + } > >> > + } > >> > + > >> > + return true; > >> > +} > >> > + > >> > +static bool > >> > +init_liveness_block(nir_block *block, void *void_state) > >> > +{ > >> > + struct live_variables_state *state = void_state; > >> > + > >> > + block->live_in = reralloc(block, block->live_in, BITSET_WORD, > >> > + state->bitset_words); > >> > + memset(block->live_in, 0, state->bitset_words * > >> > sizeof(BITSET_WORD)); > >> > + > >> > + block->live_out = reralloc(block, block->live_out, BITSET_WORD, > >> > + state->bitset_words); > >> > + memset(block->live_out, 0, state->bitset_words * > >> > sizeof(BITSET_WORD)); > >> > + > >> > + return true; > >> > +} > >> > + > >> > +static bool > >> > +set_src_live(nir_src *src, void *void_live) > >> > +{ > >> > + BITSET_WORD *live = void_live; > >> > + > >> > + if (!src->is_ssa) > >> > + return true; > >> > + > >> > + if (src->ssa->live_index == 0) > >> > + return true; /* undefined variables are never live */ > >> > + > >> > + BITSET_SET(live, src->ssa->live_index); > >> > + > >> > + return true; > >> > +} > >> > + > >> > +static bool > >> > +set_dest_dead(nir_dest *dest, void *void_live) > >> > +{ > >> > + BITSET_WORD *live = void_live; > >> > + > >> > + if (dest->is_ssa) > >> > + BITSET_CLEAR(live, dest->ssa.live_index); > >> > + > >> > + return true; > >> > +} > >> > + > >> > +/* Phi nodes exist "between" blocks and all the phi nodes at the > start > >> > of a > >> > + * block act "in parallel". When we propagate from the live_in of > one > >> > + * block to the live out of the other, we have to kill any writes > from > >> > phis > >> > + * and make live any sources. > >> > + */ > >> > +static void > >> > +propagate_across_edge(nir_block *pred, nir_block *succ, > >> > + struct live_variables_state *state) > >> > +{ > >> > + BITSET_WORD live[state->bitset_words]; > >> > + memcpy(live, succ->live_in, sizeof live); > >> > + > >> > + nir_foreach_instr(succ, instr) { > >> > + if (instr->type != nir_instr_type_phi) > >> > + break; > >> > + nir_phi_instr *phi = nir_instr_as_phi(instr); > >> > + > >> > + set_dest_dead(&phi->dest, live); > >> > + } > >> > + > >> > + nir_foreach_instr(succ, instr) { > >> > + if (instr->type != nir_instr_type_phi) > >> > + break; > >> > + nir_phi_instr *phi = nir_instr_as_phi(instr); > >> > + > >> > + foreach_list_typed(nir_phi_src, src, node, &phi->srcs) { > >> > + if (src->pred == pred) { > >> > + set_src_live(&src->src, live); > >> > + break; > >> > + } > >> > + } > >> > + } > >> > + > >> > + for (unsigned i = 0; i < state->bitset_words; ++i) { > >> > + state->progress |= live[i] & ~pred->live_out[i]; > >> > >> state->progress = state->progress || (live[i] & ~pred->live_out[i]) != > 0; > >> > >> > + pred->live_out[i] |= live[i]; > >> > + } > >> > +} > >> > + > >> > +static bool > >> > +walk_instructions_block(nir_block *block, void *void_state) > >> > +{ > >> > + struct live_variables_state *state = void_state; > >> > + > >> > + /* The live out is the union (modulo phi nodes) of the live ins of > >> > its > >> > + * successors */ > >> > + if (block->successors[0]) > >> > + propagate_across_edge(block, block->successors[0], state); > >> > + if (block->successors[1]) > >> > + propagate_across_edge(block, block->successors[1], state); > >> > + > >> > + memcpy(block->live_in, block->live_out, > >> > + state->bitset_words * sizeof(BITSET_WORD)); > >> > + > >> > + nir_if *following_if = nir_block_following_if(block); > >> > + if (following_if) > >> > + set_src_live(&following_if->condition, block->live_in); > >> > + > >> > + nir_foreach_instr_reverse(block, instr) { > >> > + /* Phi nodes are handled seperately so we want to skip them. > >> > Since > >> > + * we are going backwards and they are at the beginning, we can > >> > just > >> > + * break as soon as we see one. > >> > + */ > >> > + if (instr->type == nir_instr_type_phi) > >> > + break; > >> > + > >> > + nir_foreach_dest(instr, set_dest_dead, block->live_in); > >> > + nir_foreach_src(instr, set_src_live, block->live_in); > >> > + } > >> > + > >> > + return true; > >> > +} > >> > + > >> > +static bool > >> > +src_does_not_use_def(nir_src *src, void *def) > >> > +{ > >> > + return !src->is_ssa || src->ssa != (nir_ssa_def *)def; > >> > +} > >> > + > >> > +static bool > >> > +search_for_use_after_instr(nir_instr *start, nir_ssa_def *def) > >> > +{ > >> > + /* Only look for a use strictly after the given instruction */ > >> > + struct exec_node *node = start->node.next; > >> > + while (!exec_node_is_tail_sentinel(node)) { > >> > + nir_instr *instr = exec_node_data(nir_instr, node, node); > >> > + if (!nir_foreach_src(instr, src_does_not_use_def, def)) > >> > + return true; > >> > + node = node->next; > >> > + } > >> > + return false; > >> > +} > >> > + > >> > +/* Returns true if def is live at instr assuming that def comes > before > >> > + * instr in a pre DFS search of the dominance tree. > >> > + */ > >> > +static bool > >> > +nir_ssa_def_is_live_at(nir_ssa_def *def, nir_instr *instr) > >> > +{ > >> > + if (BITSET_TEST(instr->block->live_out, def->live_index)) { > >> > + /* Since def dominates instr, if def is in the liveout of the > >> > block, > >> > + * it's live at instr > >> > + */ > >> > + return true; > >> > + } else { > >> > + if (BITSET_TEST(instr->block->live_in, def->live_index) || > >> > + def->parent_instr->block == instr->block) { > >> > + /* In this case it is either live coming into instr's block > or > >> > it > >> > + * is defined in the same block. In this case, we simply > need > >> > to > >> > + * see if it is used after instr. > >> > + */ > >> > + return search_for_use_after_instr(instr, def); > >> > + } else { > >> > + return false; > >> > + } > >> > + } > >> > +} > >> > + > >> > +bool > >> > +nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b) > >> > +{ > >> > + if (a->parent_instr == b->parent_instr) { > >> > + /* Two variables defined at the same time interfere assuming at > >> > + * least one isn't dead. > >> > + */ > >> > + return true; > >> > + } else if (a->live_index == 0 || b->live_index == 0) { > >> > + /* If either variable is an ssa_undef, then there's no > >> > interference */ > >> > + return false; > >> > + } else if (a->live_index < b->live_index) { > >> > + return nir_ssa_def_is_live_at(a, b->parent_instr); > >> > + } else { > >> > + return nir_ssa_def_is_live_at(b, a->parent_instr); > >> > + } > >> > +} > >> > + > >> > +void > >> > +nir_live_variables_impl(nir_function_impl *impl) > >> > +{ > >> > + struct live_variables_state state; > >> > + > >> > + /* We start at 1 because we reserve the index value of 0 for > >> > ssa_undef > >> > + * instructions. Those are never live, so their liveness > >> > information > >> > + * can be compacted into a single bit. > >> > + */ > >> > + state.num_ssa_defs = 1; > >> > + nir_foreach_block(impl, index_ssa_definitions_block, &state); > >> > + > >> > + /* We now know how many unique ssa definitions we have and we can > go > >> > + * ahead and allocate live_in and live_out sets > >> > + */ > >> > + state.bitset_words = BITSET_WORDS(state.num_ssa_defs); > >> > + nir_foreach_block(impl, init_liveness_block, &state); > >> > + > >> > + /* We need to propagate the liveness back through the CFG. Thanks > >> > to > >> > + * the wonders of SSA, this will run no more times than the depth > of > >> > the > >> > + * deepest loop + 1. > >> > + */ > >> > + do { > >> > + state.progress = 0; > >> > >> state.progress = false; > >> > >> > + nir_foreach_block_reverse(impl, walk_instructions_block, > &state); > >> > + } while (state.progress); > >> > +} > >> > diff --git a/src/glsl/nir/nir_metadata.c b/src/glsl/nir/nir_metadata.c > >> > index 070cb74..a4d618c 100644 > >> > --- a/src/glsl/nir/nir_metadata.c > >> > +++ b/src/glsl/nir/nir_metadata.c > >> > @@ -39,6 +39,8 @@ nir_metadata_require(nir_function_impl *impl, > >> > nir_metadata required) > >> > nir_index_blocks(impl); > >> > if (NEEDS_UPDATE(nir_metadata_dominance)) > >> > nir_calc_dominance_impl(impl); > >> > + if (NEEDS_UPDATE(nir_metadata_live_variables)) > >> > + nir_live_variables_impl(impl); > >> > > >> > #undef NEEDS_UPDATE > >> > > >> > diff --git a/src/mesa/main/bitset.h b/src/mesa/main/bitset.h > >> > index 601fd0e..2558da4 100644 > >> > --- a/src/mesa/main/bitset.h > >> > +++ b/src/mesa/main/bitset.h > >> > @@ -32,6 +32,7 @@ > >> > #define BITSET_H > >> > > >> > #include "imports.h" > >> > +#include "macros.h" > >> > > >> > > >> > > /**************************************************************************** > >> > * generic bitset implementation > >> > -- > >> > 2.2.0 > >> > > >> > _______________________________________________ > >> > mesa-dev mailing list > >> > mesa-dev@lists.freedesktop.org > >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev