Thanks. Looked through stats and at some of the regressions. Some of the areas I noticed we were doing worse:
We generate two CMPs for discard_if; only one without NIR. I think you had an idea about solving this. SEL peephole interference -- we knew about this in general, but I found an interesting failure mode: NIR generates (+f0) if(8) JIP: 4 UIP: 5 else(8) JIP: 3 mov(8) g124<1>F -g124<8,8,1>F endif(8) JIP: 6 We should of course improve dead-control-flow-elimination to remove the else and set the if's predicate_inverse, but non-NIR generates (+f0) sel(8) g124<1>F g25<8,8,1>F -g25<8,8,1>F which is probably better even than a predicated MOV since it's not a partial write and subject to optimization restrictions. The assembly matches the NIR exactly, so I'm assuming NIR is removing a self-assignment in the if-then block, preventing it from turning it into a conditional select. Also, since the fs's SEL peephole only recognizes MOVs to the same destination in order, it doesn't handle tropico-5/387 properly, which already improved from 110 to 79 instructions. I also noticed an interesting pattern in defense-grid/537 (and some others, I think). NIR generates mul(8) g4<1>F g2<8,8,1>F g3<8,8,1>F mul(8) g6<1>F g5<8,8,1>F g4<8,8,1>F mad(8) g127<1>F g6<4,4,1>F g3<4,4,1>F g2<4,4,1>F whereas without NIR we get: mul(8) g5<1>F g2<8,8,1>F g4<8,8,1>F mad(8) g127<1>F g5<4,4,1>F g3<4,4,1>F g5<4,4,1>F Again, the assembly matches the NIR, so NIR is doing something bad. The changes to sun-temple/209 (and a lot of other sun temple shaders) look wrong. I'm not sure if the translation from NIR is doing something bad or what, but we should have 8 texture fetches and we only end up with 2. Some highlights -- Some shaders are improved because NIR recognizes exp2(log(x) * y) as pow(x, y) when the GLSL optimizations couldn't, because its dead code elimination is much too stupid to allow tree grafting to put the IR in the form we expect. This accounts for a low of the high percentage gains. It cuts some Kerbal Space Program shaders in half (~1000 -> ~500 instructions) and removes ~100 spills and ~100 fills from each. Not sure how it actually accomplished this. Too hard to read the assembly with all of the spills. Code generated by tropico-5/252 is really stupid. We MOVs both before and in an if statement that do the same thing. NIR cleans that up easily as a side-effect of SSA. In strike-suit-zero/156 we emit more MADs that we couldn't recognize from the GLSL IR, since again, its DCE is terrible and we can't combine trees. In the-binding-of-isaac-rebirth/18, it's able to CSE some duplicate rcp expressions in different basic blocks. I'm not sure why the backend isn't doing this. At this point I've looked through all of the shaders helped by >15% or more. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev