On Sat, May 16, 2015 at 1:01 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Sat, May 16, 2015 at 12:59 PM, Matt Turner <matts...@gmail.com> wrote: >> On Sat, May 16, 2015 at 12:45 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >>> On Sat, May 16, 2015 at 12:12 PM, Matt Turner <matts...@gmail.com> wrote: >>>> On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke <kenn...@whitecape.org> >>>> wrote: >>>>> Looking at a couple of the shaders that are still worse off...it looks >>>>> like a ton of Source shaders used to do MUL/ADD with an attribute and >>>>> two immediates, and now are doing MOV/MOV/MAD. >>>> >>>> I just looked, and thought that too for a minute, but it actually >>>> shouldn't be doing that. Take for instance: >>>> >>>> shaders/closed/steam/dota-2/498.shader_test VS SIMD8: 47 -> 53 (12.77%) >>>> >>>> It indeed replaces 6x MUL/ADD pairs with MOV/MAD (introducing 6 extra >>>> MOVs), but.... >>>> >>>> Without NIR we have >>>> >>>> mul(8) g15<1>F g6<8,8,1>F 6F >>>> ... >>>> add(8) g16<1>F g15<8,8,1>F 2.1F >>>> add(8) g35<1>F g15<8,8,1>F 3.1F >>>> add(8) g42<1>F g15<8,8,1>F 4.1F >>>> add(8) g45<1>F g15<8,8,1>F 5.1F >>>> add(8) g48<1>F g15<8,8,1>F 0.1F >>>> add(8) g51<1>F g15<8,8,1>F 1.1F >>>> >>>> That is, one multiply is consumed by 6 adds. >>>> >>>> With NIR we have >>>> >>>> mov(1) g22<1>F 2.1F >>>> mov(1) g22.1<1>F 6F >>>> mad(8) g16<1>F g22<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F >>>> mov(1) g22.2<1>F 3.1F >>>> mad(8) g23<1>F g22.2<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F >>>> mov(1) g22.3<1>F 4.1F >>>> mad(8) g30<1>F g22.3<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F >>>> mov(1) g22.4<1>F 5.1F >>>> mad(8) g33<1>F g22.4<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F >>>> mov(1) g22.5<1>F 0.1F >>>> mad(8) g36<1>F g22.5<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F >>>> mov(1) g22.6<1>F 1.1F >>>> mad(8) g39<1>F g22.6<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F >>>> >>>> So we're doing the g6 * 6F operation 6 times! We see this in the NIR as >>>> well: >>>> >>>> vec1 ssa_419 = ffma ssa_384, ssa_132, ssa_133 >>>> vec1 ssa_423 = ffma ssa_384, ssa_132, ssa_135 >>>> vec1 ssa_427 = ffma ssa_384, ssa_132, ssa_137 >>>> vec1 ssa_428 = ffma ssa_384, ssa_132, ssa_139 >>>> vec1 ssa_429 = ffma ssa_384, ssa_132, ssa_141 >>>> vec1 ssa_430 = ffma ssa_384, ssa_132, ssa_144 >>>> >>>> Whoops. Ideas for fixing that? I'm guessing that this accounts for >>>> nearly all of the remaining 1120 hurt programs. >>> >>> Ugh... We've been tacitly assuming that your constant combine stuff >>> will magically make immediates not a problem. In this case, they are >>> a problem. I guess we could do something different for 1 vs. 2 >>> immediates. >> >> That's not really the problem as far as I see. I mean, we could split >> MADs that do x * imm + imm, but I would think NIR shouldn't be >> combining these operations if the multiply is used in a bunch of >> places. >> >> The current code in the ffma peephole in does... to quote the comment: >> >> /* Only absorb a fmul into a ffma if the fmul is is only used in fadd >> * operations. This prevents us from being too aggressive with our >> * fusing which can actually lead to more instructions. >> */ >> >> Can't we pretty trivially modify that to count the number of uses as >> well and only combine if it's used in one place? >> >> To be honest, before I looked in the code I thought that's what it was doing. > > If you want to know why I did it that way, just run shader-db. :-)
Ok, longer less snarky version: I found a variety of places where the user was doing, for instance, 2 muls and 4 adds where the result of each mul is used twice. The result is 6 instructions instead of just the 4 mad's. It's entirely possible that, thanks to latancies, the 6 would actually be better, but that's why I did it that way. --Jason _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev