This series implements a code-generation optimization for sign(x)*y. In GLSL, sign(x) is defined as:
Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0. It is silent on the NaN behavior, so I have taken it as "undefined." I don't think the new implementation will produce different results from the old. The optimization is only applied to the scalar backend. On Skylake, there are ~1,000 shaders in VS, TCS, and TES stages helped. It may be worth applying this to the vector backend for Haswell. I have a couple long flights in my near future, so I might work on it then. We'll see. This might also be a good newbie projet for someone wanting to get into the i965 compiler backend. There are actually two versions of this series. The series that I am sending to the list includes "i965/fs: Eliminate dead code first". The results of that patch is not good. The other version of the series omits that patch, but it adds a bunch of horror to "i965/fs: Add a scale factor to emit_fsign". Basically, if both the fused and non-fused version of the nir_op_fsign are emitted, copy propagation will propagate part of the common expressions, but, due to the predicated OR or XOR, one extra MOV will be left around. That single instruction ruins the whole optimization. Both versions are available in my cgit. List version: https://cgit.freedesktop.org/~idr/mesa/log/?h=fsign-optimization That branch includes a few things that I tried, but they did not pan out. Alternate version: https://cgit.freedesktop.org/~idr/mesa/log/?h=fsign-optimization-emit-no-dead-code I think the version sent to the list is cleaner, but it's shader-db results are not as good. The difference between the list version and the other version on Skylake is shown below. Other platforms had similar shaped results. total instructions in shared programs: 15090997 -> 15091028 (<.01%) instructions in affected programs: 10251 -> 10282 (0.30%) helped: 0 HURT: 26 HURT stats (abs) min: 1 max: 4 x̄: 1.19 x̃: 1 HURT stats (rel) min: 0.14% max: 1.96% x̄: 0.49% x̃: 0.24% 95% mean confidence interval for instructions value: 0.94 1.45 95% mean confidence interval for instructions %-change: 0.28% 0.71% Instructions are HURT. total cycles in shared programs: 565827580 -> 565824007 (<.01%) cycles in affected programs: 1995745 -> 1992172 (-0.18%) helped: 271 HURT: 248 helped stats (abs) min: 1 max: 623 x̄: 25.79 x̃: 5 helped stats (rel) min: 0.02% max: 13.19% x̄: 0.94% x̃: 0.28% HURT stats (abs) min: 1 max: 204 x̄: 13.78 x̃: 4 HURT stats (rel) min: 0.01% max: 6.57% x̄: 0.52% x̃: 0.21% 95% mean confidence interval for cycles value: -11.25 -2.52 95% mean confidence interval for cycles %-change: -0.38% -0.11% Cycles are helped. The version sent to the list saves a couple instructions in 26 shaders, but cycles are hurt. The list version also avoids ~65 lines of ugly code. I also sent a couple tests to the piglit list that exersice a bug that I had during development. https://patchwork.freedesktop.org/patch/247911/ In the alternate version, for an expression like sign(a)*sign(b), sign(b) would never get emitted. When the fused sign(a)*x was emitted, it would explode. The solution was to just bail on the optimization when sign(a)*sign(b) is encountered. I suspect that's the source of the 32 shaders with instructions hurt in the alternate version, but I have not verified that. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev