On Tue, Jun 10, 2014 at 3:34 PM, Matt Turner <matts...@gmail.com> wrote: > On Wed, May 28, 2014 at 4:44 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Tue, May 27, 2014 at 9:47 PM, Matt Turner <matts...@gmail.com> wrote: >>> Here's a respin of my load_payload series from mid-April with some >>> feedback from Ken addressed and some bugs fixed. >>> >>> This series is available in my tree (with a few unrelated patches >>> before it) >>> >>> git://people.freedesktop.org/~mattst88/mesa tex-sources >>> >>> This is a prep series for implementing SSA in the i965 fragment >>> shader backend. >> >> I wonder how SSA in the i965 backend will interact with my SSA in GLSL >> IR series? > > I'm not really sure. Of course it'd be nice to not have a bunch of > similar code in src/glsl and the i965 driver. > > I've got a bunch of patches on the cfg branch of my tree > > git://people.freedesktop.org/~mattst88/mesa cfg > > that start implementing SSA in the fs backend. It's got code to create > the dominance tree and dominance frontier sets, to insert Phi nodes, > and to rename variables. I've also adapted my local value numbering > pass to operate globally, but of course it requires global code motion > to perform fix ups.
Nice! The way it works in GLSL IR is much worse, unfortunately, because we don't have the cfg or the dominance tree (see my discussions with Eric on that...) so we have to basically fake it in a really tricky way. Inspired by those difficulties, and by the fact that many SSA passes (like value numbering) require a flat IR without expression trees, I took up Ken's suggestion and started to design another IR that would have all the properties we want. Right now, it's basically just a header file, but if the experiment works out then maybe we can consider going GLSL IR -> new IR -> backend instead... it would require rewriting the code to lower the IR to FS instructions, but it should be a *lot* easier this time. Anyways, I'd be glad to discuss all this in person next week. > > I haven't really started on translating out of SSA, because it's been > hard to get interested when performing register allocation from SSA > seems so appealing. That's obviously a sizeable amount of work. Yeah... I just got the motivation to read "Register allocation for programs in SSA-form," and it seems really interesting. Coalescing and phi node elimination seem difficult though, especially if you don't have a swap instruction (apparently register allocation is only NP-complete if you don't have a swap instruction... who'd-a thunk that!) > > Predication is another problem I need to handle. Gated SSA was > suggested to me as the best solution, with some other nice properties > that allow things like divergence analysis. I looked at that, and it doesn't seem very interesting - it doesn't really do much for us thanks to the extra information that the structured control flow gives to us. The only interesting thing there are the alpha statements, which allow one to represent arrays in SSA form. This isn't a new idea - the same thing was suggested in the original to-SSA paper by Cytron et. al. where it was called the "Update" operation - and I'm not sure how useful that is. The problem is that you may wind up with extra copies of array variables after conversion out of SSA, in which case you'd have to add a loop to copy each element of the array, which is a lot more expensive than introducing copies of non-array variables. The solution I was thinking of goes something like this: flatten if-statements in SSA form, adding a predicate to each instruction in the if (either the condition for the then statements or its inverse for the else statements) and converting phi nodes to conditional selects. This should probably be done right before conversion out of SSA (or register allocation) so other passes don't have to deal with the predicates. Then, while converting out of SSA or doing coalescing during SSA-based register allocation, add the following condition to the test if two SSA variables interfere (the one proposed by Budimlic et. al. and extended by Boissinot, Darte, and Rastello): If A is defined in the same basic block as B, A dominates B, and A is dead at the end of the basic block (i.e. it's not part of the live-out set), then normally we would go backwards from the end of the basic block to find a use of A that would make A live during the definition of B (and therefore, A and B would interfere). But if the instruction where A is used has a complementary predicate to the instruction where B is defined, then A and B do not interfere and we can ignore the use and move on. Here, two predicates are "complementary" if it is impossible for both to be true at once. Note that special care needs to be taken with conditional selects: if we have C = csel(A, B, pred), then the use of A is predicated by pred and the use of B is predicated by !pred. Finally, for each conditional select, try and coalesce the sources of each conditional select together to turn it into a move (and if that succeeds, then try and coalesce the sources with the destination). This can be done at the same time that the program is being converted to CSSA form - essentially, "phi congruence classes" become "phi-and-conditional-select congruence classes" (since, after all, the conditional selects used to be phi nodes). This allows us to handle programs like: if (pred) { A = ... } else { A = ... } which, after SSA-ification and if-flattening, becomes: (pred) A_1 = ... (!pred) A_2 = ... A_3 = csel(A_1, A_2, pred) The only use of A_1 has predicate "pred", while the definition of A_2 has predicate "!pred," so A_1 and A_2 do not interfere and they can be coalesced together, eliminating the csel. Another example: A = ... ... = A if (pred) { A = ... } else { ... = A } which becomes: A_1 = ... ... = A_1 (pred) A_2 = ... (!pred) ... = A_1 A_3 = csel(A_2, A_1, pred) Again, the only uses of A_1 after A_2 is defined have predicate "!pred," so A_1 and A_2 don't interfere. Determining if two variables are complementary may be complicated: two SSA variables pred1 and pred2 can be a series of ands: pred1 = A and B and C and .... pred2 = D and E and F and .... and if any one of the possible pairs are opposite (A = !D, A = !E, B = !D, etc.), then pred1 and pred2 will be complimentary. Typically, A, B, C, etc. will be the conditions of former if-statements. The first step is to figure out which variables are opposite, i.e. A = !B. Cases like A = (foo > bar) and B = (bar >= foo) also need to considered here. So the algorithm, similar to value numbering, goes like: for each comparison A (<, >=, ==, !=) B: let a_index be the index of A, b_index be the index of B, and cond be the condition if there exists an instruction with (!cond, a_index, b_index) in the hash table, then it and the current instruction are opposite insert the instruction into the hash table, with (cond, a_index, b_index) Next, to find the complementary variables: Start by marking all the opposite pairs of variables as complementary. for each and-statement C = A and B, in the order that they are defined (i.e. in reverse post-order): for each variable D that is complementary to A: mark C and D as complementary for each variable E that is complementary to B: mark C and E as complementary > >> I've recently re-started work on this after my school work >> started winding down, and as of now here are work items remaining I >> can think of: >> >> 1) Add unit tests for the conversion to SSA >> 2) Make the conversion out of SSA not suck as much, esp. with respect >> to writemasked operations (this is pretty difficult, as apparently >> it's still an unsolved problem that to my knowledge no one from >> academia has tried to tackle...) >> 3) Add some trivial SSA-based optimizations (dead code elimination, >> copy propagation) >> 4) Add more complicated optimizations like CSE, constant propagation, >> GVN-GCM... I believe some of these, especially value numbering, depend >> on a flat IR to work, so this might be a lot harder to do. >> >> I propose that I just do #1, post a new patch series with Paul's >> comments on the original one addressed (that would probably take less >> than a week), and then you can use the opt_to_ssa pass to avoid having >> to duplicate that logic that in the backend since it's pretty >> complicated. So it would look like: >> >> GLSL IR -> opt_to_ssa -> SSA-ified GLSL IR -> i965 fragment shader backend > > That sounds good. I'd even settle for getting non-SSA GLSL IR in the > backend. I expect SSA GLSL IR will allow lots of our optimizations to > be more effective, and I've got a shader from a benchmark that I could > cut a pile of instructions out of if only it didn't unnecessarily > reuse a variable. SSA would totally fix that. > >> We would have to modify the pass that scalarizes the GLSL IR to work >> with phi nodes and ir_quadop_vector expressions, but that wouldn't be >> too difficult. That way we would have an immediate user of the SSA >> stuff while being able to make it useful for everyone else in the long >> term. > > Sure. Except I've written the to-SSA code already, and am lacking the > from-SSA equivalent. :) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev