On Wed, Mar 18, 2015 at 11:27 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Wed, Mar 18, 2015 at 7:22 AM, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Wed, Mar 18, 2015 at 1:56 AM, Matt Turner <matts...@gmail.com> wrote: >>> Thanks. Looked through stats and at some of the regressions. >>> >>> Some of the areas I noticed we were doing worse: >>> >>> We generate two CMPs for discard_if; only one without NIR. I think you >>> had an idea about solving this. >>> >>> SEL peephole interference -- we knew about this in general, but I >>> found an interesting failure mode: >>> >>> NIR generates >>> >>> (+f0) if(8) JIP: 4 UIP: 5 >>> else(8) JIP: 3 >>> mov(8) g124<1>F -g124<8,8,1>F >>> endif(8) JIP: 6 >>> >>> We should of course improve dead-control-flow-elimination to remove >>> the else and set the if's predicate_inverse, but non-NIR generates >>> >>> (+f0) sel(8) g124<1>F g25<8,8,1>F -g25<8,8,1>F >>> >>> which is probably better even than a predicated MOV since it's not a >>> partial write and subject to optimization restrictions. The assembly >>> matches the NIR exactly, so I'm assuming NIR is removing a >>> self-assignment in the if-then block, preventing it from turning it >>> into a conditional select. >> >> I think this is because the sel peephole (along with all the other >> optimizations) in NIR only works before we lower to source mods, where >> this code probably looked something like: >> >> ssa_1 = ...; >> if ... { >> } else { >> ssa_2 = fneg ssa_1; >> } >> ssa_3 = phi ssa_1, ssa_2; > > More specifically, we probably got > > b = foo ? a : -a > > from the GLSL source which NIR turns into > > if (foo) { > ssa_1 = a; > } else { > ssa_2 = -a; > } > ssa_3 = phi(ssa_1, ssa_2) > > which gets copy-propagated to what Connor mentioned above. > >> So we can probably fix it just by adding fneg/ineg to >> block_check_for_allowed_instrs(). > > That was my first thought but we're starting to get into the world of > backend-specific knowledge there. Maybe this is where we just need to > pass in Eric's backend info struct and predicate doing so on whether > or not neg is free? But I guess the only compiler we care about at > the moment where that isn't true is vc4 which can't do control-flow > right now anyway. Thoughts? > > We may also want to simply make the backend's peephole smarter.
I thought about predicating it on !lower_neg_to_sub, but if a backend sets that then we'll already have lowered the negates to subtracts. We could have a separate "my backend uses neg, but it's not actually free" thing, or add a general "how much does this thing cost" callback to nir_shader_compiler_options, but it feels like overkill to me right now. Maybe we'll need it later, though... > >>> >>> Also, since the fs's SEL peephole only recognizes MOVs to the same >>> destination in order, it doesn't handle tropico-5/387 properly, which >>> already improved from 110 to 79 instructions. >>> >>> I also noticed an interesting pattern in defense-grid/537 (and some >>> others, I think). NIR generates >>> >>> mul(8) g4<1>F g2<8,8,1>F g3<8,8,1>F >>> mul(8) g6<1>F g5<8,8,1>F g4<8,8,1>F >>> mad(8) g127<1>F g6<4,4,1>F g3<4,4,1>F g2<4,4,1>F >>> >>> whereas without NIR we get: >>> >>> mul(8) g5<1>F g2<8,8,1>F g4<8,8,1>F >>> mad(8) g127<1>F g5<4,4,1>F g3<4,4,1>F g5<4,4,1>F >>> >>> Again, the assembly matches the NIR, so NIR is doing something bad. >>> >>> The changes to sun-temple/209 (and a lot of other sun temple shaders) >>> look wrong. I'm not sure if the translation from NIR is doing >>> something bad or what, but we should have 8 texture fetches and we >>> only end up with 2. >>> >>> >>> Some highlights -- >>> >>> Some shaders are improved because NIR recognizes exp2(log(x) * y) as >>> pow(x, y) when the GLSL optimizations couldn't, because its dead code >>> elimination is much too stupid to allow tree grafting to put the IR in >>> the form we expect. This accounts for a low of the high percentage >>> gains. >>> >>> It cuts some Kerbal Space Program shaders in half (~1000 -> ~500 >>> instructions) and removes ~100 spills and ~100 fills from each. Not >>> sure how it actually accomplished this. Too hard to read the assembly >>> with all of the spills. >>> >>> Code generated by tropico-5/252 is really stupid. We MOVs both before >>> and in an if statement that do the same thing. NIR cleans that up >>> easily as a side-effect of SSA. >>> >>> In strike-suit-zero/156 we emit more MADs that we couldn't recognize >>> from the GLSL IR, since again, its DCE is terrible and we can't >>> combine trees. >>> >>> In the-binding-of-isaac-rebirth/18, it's able to CSE some duplicate >>> rcp expressions in different basic blocks. I'm not sure why the >>> backend isn't doing this. >>> >>> >>> At this point I've looked through all of the shaders helped by >15% or more. >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev