On Tue, Sep 22, 2015 at 4:29 PM, Matt Turner <matts...@gmail.com> wrote: > On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: >> On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: >>> >>> On Sep 21, 2015 5:45 PM, "Matt Turner" <matts...@gmail.com> wrote: >>>> >>>> On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand <ja...@jlekstrand.net> >>>> wrote: >>>> > At this point, piglit is the same as for GLSL and the shader-db numbers >>>> > are >>>> > looking pretty good. On SNB, GLSL vs. NIR for vec4 programs is: >>>> > >>>> > total instructions in shared programs: 2020573 -> 1822601 (-9.80%) >>>> > instructions in affected programs: 1883334 -> 1685362 (-10.51%) >>>> > helped: 13328 >>>> > HURT: 3594 >>>> > >>>> > and there are patches on the list that improve this to >>>> > >>>> > total instructions in shared programs: 2020283 -> 1805487 (-10.63%) >>>> > instructions in affected programs: 1855759 -> 1640963 (-11.57%) >>>> > helped: 14142 >>>> > HURT: 2346 >>>> >>>> Wow, that's great. I didn't realize we were that close. >>>> >>>> That said, I don't feel like we're /quite/ ready for this (especially >>>> with outstanding optimization patches on the list). I'm not sure what >>>> patches are pending. >>> >>> Only two: the one you sent today and Alejandro's patch to make copy >>> propagation less type-sensitive. >>> >>>> Some things I've seen in digging through hurt programs today: >>>> >>>> portal-2/high/5134 emits: >>>> >>>> vec1 ssa_53 = flog2 ssa_52 >>>> vec1 ssa_54 = flog2 ssa_52.y >>>> vec1 ssa_55 = flog2 ssa_52.z >>>> vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w >>>> vec3 ssa_57 = fmul ssa_56, ssa_3 >>>> vec1 ssa_58 = fexp2 ssa_57 >>>> vec1 ssa_59 = fexp2 ssa_57.y >>>> vec1 ssa_60 = fexp2 ssa_57.z >>>> vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w >>>> >>>> which we didn't transform into a vec3 pow with or without NIR but we >>>> really should. Why isn't NIR able to handle this? >> >> Ken and I were talking about this today. What it comes down to is >> that no one has written the pass yet. We haven't done that many >> vector optimizations to date. > > Yeah, I'm not concerned about this. We weren't doing it before either, > so it's not a regression.
Yeah, writing a vectorizor is something that should probably be done but isn't any more urgent than any other "make vec4 better" thing. >>>> (also, why isn't >>>> ".x" printed when the use of an ssa value scalar, e.g., in the >>>> assignment of ssa_58 the RHS should use ssa_57.x). >> >> It doesn't print the identity swizzle. > > Patch sent. > >>>> We generate worse code for all_equal/any_nequal/any. >> >> Yes, we should fix that. Suggestions/patches welcome, I don't have >> any hot ideas at the moment. >> >>>> book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses >>>> DPH and NIR doesn't have DPH. NIR should probably grow a DPH >>>> instruction even if we don't have an optimization to recognize >>>> open-coded DPH. >> >> We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we >> really wanted to. The long-term solution is probably to add swizzle >> support to nir_search but that's going to be a real bear. How would >> having the nir_op_dph instruction help if we can't recognize it? > > Because the ARB vertex program language and the Mesa IR that we > translate from both have DPH already. We're just throwing it away > because NIR doesn't have DPH. Right. Should be easy enough to add. I can do that if you'd like or you can; I don't care. >>>> Lots of things hurt because of lack of global copy/constant >>>> propagation. I think NIR often emits the constant loads in blocks >>>> earlier than their uses and the backend optimizations aren't able to >>>> cope. See team-fortress-2/2197 for example (search for 953267991D, the >>>> hex value for 0.0001F). >> >> Hrm... One option would be to copy-prop load_const in emit_alu. This >> should be easy enough to do if we detect that it's a 2-src and one >> source is an immediate. We could also do global copy-prop but I don't >> know how hard that is. >> >>>> I remember this issue from the FS/NIR backend as well, but dota-2/504 >>>> (and others) emit: >>>> >>>> mad(8) g16<1>.xF g11<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>>> mad(8) g19<1>.xF g10<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>>> mad(8) g22<1>.xF g9<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>>> mad(8) g25<1>.xF g8<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>>> mad(8) g28<1>.xF g7<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>>> mad(8) g31<1>.xF g6<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>>> >>>> where the multiplication is duplicated. I can't remember what we decided. >> >> If I remember correctly, it came down to "optimization is hard" and we >> said "good enough" about our current heuristics. > > I dug out the old threads, but all I found was a snarky reply. > > Patch sent. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev