From: Ian Romanick <ian.d.roman...@intel.com> None of the callers actually wanted what it did. In ptn_xpd, you only ever want a vec3 swizzle. In ptn_tex, you want a swizzle that matches the number of required texture coordinates.
shader-db results: G45: total instructions in shared programs: 4011240 -> 4010911 (-0.01%) instructions in affected programs: 59232 -> 58903 (-0.56%) helped: 114 HURT: 0 total cycles in shared programs: 84314194 -> 84313220 (-0.00%) cycles in affected programs: 779150 -> 778176 (-0.13%) helped: 110 HURT: 13 Ironlake: total instructions in shared programs: 6397262 -> 6396605 (-0.01%) instructions in affected programs: 117402 -> 116745 (-0.56%) helped: 227 HURT: 0 total cycles in shared programs: 128889798 -> 128888524 (-0.00%) cycles in affected programs: 1214644 -> 1213370 (-0.10%) helped: 179 HURT: 44 Sandy Bridge: total instructions in shared programs: 8467391 -> 8467384 (-0.00%) instructions in affected programs: 3107 -> 3100 (-0.23%) helped: 10 HURT: 6 total cycles in shared programs: 117580120 -> 117573448 (-0.01%) cycles in affected programs: 103158 -> 96486 (-6.47%) helped: 84 HURT: 11 Ivy Bridge: total instructions in shared programs: 7774255 -> 7774258 (0.00%) instructions in affected programs: 1677 -> 1680 (0.18%) helped: 8 HURT: 6 total cycles in shared programs: 65743828 -> 65739190 (-0.01%) cycles in affected programs: 89312 -> 84674 (-5.19%) helped: 78 HURT: 23 Haswell: total instructions in shared programs: 7107172 -> 7107150 (-0.00%) instructions in affected programs: 2048 -> 2026 (-1.07%) helped: 16 HURT: 0 total cycles in shared programs: 64653636 -> 64647486 (-0.01%) cycles in affected programs: 86836 -> 80686 (-7.08%) helped: 85 HURT: 17 Broadwell and Skylake: total instructions in shared programs: 8447529 -> 8447507 (-0.00%) instructions in affected programs: 2038 -> 2016 (-1.08%) helped: 16 HURT: 0 total cycles in shared programs: 66418670 -> 66413416 (-0.01%) cycles in affected programs: 90110 -> 84856 (-5.83%) helped: 83 HURT: 20 Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> --- Here's the rub with this patch... some shaders have instruction counts changed (some decrease, some increase) due to a cascade of effects. First, by not having some components "used" as TEX sources, some multiply results are only used once. These allows the multiplies to be fused with later adds to form ffmas. The added MADs generated in the i965 backend can either drastically improve things or drastically damage things. Below are two results. Shader names removed to protect the innocent. cycles helped: shaders/closed/game-A/fp-794.shader_test FS SIMD16: 830 -> 646 (-22.17%) cycles HURT: shaders/closed/game-B/fp-215.shader_test FS SIMD16: 2098 -> 3094 (47.47%) Neither of these shaders had their instruction counts change, but the number MADs (which are doubled-up in SIMD16) change. With the added MADs, the scheduler chooses a really, really bad ordering for the TEX instructions. Since these are ARB_fragment_program shaders, there are no loops. I believe the relative change in the cycle counts. src/mesa/program/prog_to_nir.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/program/prog_to_nir.c b/src/mesa/program/prog_to_nir.c index ce25f6d..a6119ae 100644 --- a/src/mesa/program/prog_to_nir.c +++ b/src/mesa/program/prog_to_nir.c @@ -59,7 +59,6 @@ struct ptn_compile { #define SWIZ(X, Y, Z, W) \ (unsigned[4]){ SWIZZLE_##X, SWIZZLE_##Y, SWIZZLE_##Z, SWIZZLE_##W } -#define ptn_swizzle(b, src, x, y, z, w) nir_swizzle(b, src, SWIZ(x, y, z, w), 4, true) #define ptn_channel(b, src, ch) nir_swizzle(b, src, SWIZ(ch, ch, ch, ch), 1, true) static nir_ssa_def * @@ -491,11 +490,11 @@ ptn_xpd(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src) ptn_move_dest_masked(b, dest, nir_fsub(b, nir_fmul(b, - ptn_swizzle(b, src[0], Y, Z, X, X), - ptn_swizzle(b, src[1], Z, X, Y, X)), + nir_swizzle(b, src[0], SWIZ(Y, Z, X, W), 3, true), + nir_swizzle(b, src[1], SWIZ(Z, X, Y, W), 3, true)), nir_fmul(b, - ptn_swizzle(b, src[1], Y, Z, X, X), - ptn_swizzle(b, src[0], Z, X, Y, X))), + nir_swizzle(b, src[1], SWIZ(Y, Z, X, W), 3, true), + nir_swizzle(b, src[0], SWIZ(Z, X, Y, W), 3, true))), WRITEMASK_XYZ); ptn_move_dest_masked(b, dest, nir_imm_float(b, 1.0), WRITEMASK_W); } @@ -642,7 +641,8 @@ ptn_tex(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src, unsigned src_number = 0; instr->src[src_number].src = - nir_src_for_ssa(ptn_swizzle(b, src[0], X, Y, Z, W)); + nir_src_for_ssa(nir_swizzle(b, src[0], SWIZ(X, Y, Z, W), + instr->coord_components, true)); instr->src[src_number].src_type = nir_tex_src_coord; src_number++; -- 2.5.5 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev