On Wed, Jan 14, 2015 at 1:52 PM, Matt Turner <matts...@gmail.com> wrote: > On Wed, Jan 14, 2015 at 1:29 PM, Matt Turner <matts...@gmail.com> wrote: >> glsl: Optimize certain if-statements to just casts from the condition > > Cherry-picked to master, the shader-db results are > > total instructions in shared programs: 5965630 -> 5952789 (-0.22%) > instructions in affected programs: 737228 -> 724387 (-1.74%) > GAINED: 5 > LOST: 16 > > and we hurt 20 programs: 12 vec4 programs significantly (>68%) and 8 > SIMD8/16 programs by 1 instruction.
It looks like the vec4 programs have loops that are now able to be unrolled, so those are actually improvements. > This, and seemingly every other work-in-progress branch I have really > highlights the improvements we need to make to instruction scheduling. > It feels fitting that one of the last significant changes Eric made to > scheduling has a commit message that says "This is madness, [...]" > >> i965/fs: Emit smarter code for b2f > > I wouldn't expect this one to change instruction counts, except on gen > <= 5 where maybe we get to skip the true/false resolve (apparently I > did it wrong in my merge -- it caused a bunch of failures on G45 and > ILK according to Jenkins). On Haswell, > > total instructions in shared programs: 5954954 -> 5955030 (0.00%) > instructions in affected programs: 4212 -> 4288 (1.80%) > > with three programs helped (that I just added to shader-db on Monday, > yay!) and 19 hurt, 12 significantly. I'm surprised. The smallest program helped (188->187 instructions) did this immediately before an endif: -and(8) g3<1>D g2<8,8,1>D 0x3f800000UD -mov(8) g23<1>F g3<8,8,1>F +mov.sat(8) g23<1>F g2<8,8,1>UD I assume register coalescing wasn't able to get rid of the extra MOV. The most hurt shader was affected like so: -cmp.l.f0(8) g12<1>D g6<8,8,1>F 0F -and(8) g13<1>D g12<8,8,1>D 0x3f800000UD +cmp.l.f0(8) g19<1>D g6<8,8,1>F 0F +mov.sat(8) g7<1>F g19<8,8,1>UD +mov.sat(8) g8<1>F g19<8,8,1>UD +mov.sat(8) g9<1>F g19<8,8,1>UD because we don't CSE MOV instructions. Fixing CSE handle saturated MOVs is trivial though, and after that change we're left with (ignoring potential gen <= 5 improvements) three programs helped by one or two instructions because of the deficiency in register coalescing. We can compact mov.sat dst:F src:UD though, so that's an improvement over AND 0x3f800000. For the gl_FrontFacing case, I think using your more general approach will actually be better. A bunch of shaders do multiple gl_FrontFacing ternaries with different 1.0/-1.0/0.0 values and we could do two of these in 3 instructions by eliminating one of the ASRs that expands the front-facing bit to a bool. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev