From: Ian Romanick <ian.d.roman...@intel.com> For some reason, if I did not move the regular lowering to late optimizations, the new lowering would never trigger. This also means that the fsub lowering had to be added to late optimizations, and this requires "intel/compiler: Repeat nir_opt_algebraic_late until no more progress".
The loops removed by this patch are the same loops added by "intel/compiler: Don't emit flrp for Gen4 or Gen5" I am CC'ing people who are responsible for drivers that set lower_flrp32 as this patch will likely affect shader-db results for those drivers. No changes on any Gen6+ platform. Iron Lake total instructions in shared programs: 7730019 -> 7731893 (0.02%) instructions in affected programs: 139980 -> 141854 (1.34%) helped: 262 HURT: 329 helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.11% max: 4.69% x̄: 1.70% x̃: 1.30% HURT stats (abs) min: 1 max: 19 x̄: 8.09 x̃: 7 HURT stats (rel) min: 0.32% max: 23.53% x̄: 5.10% x̃: 4.74% 95% mean confidence interval for instructions value: 2.62 3.72 95% mean confidence interval for instructions %-change: 1.73% 2.44% Instructions are HURT. total cycles in shared programs: 177866190 -> 177851638 (<.01%) cycles in affected programs: 18970354 -> 18955802 (-0.08%) helped: 1700 HURT: 962 helped stats (abs) min: 2 max: 70 x̄: 17.40 x̃: 16 helped stats (rel) min: <.01% max: 3.36% x̄: 0.37% x̃: 0.23% HURT stats (abs) min: 2 max: 114 x̄: 15.62 x̃: 6 HURT stats (rel) min: <.01% max: 10.50% x̄: 0.98% x̃: 0.39% 95% mean confidence interval for cycles value: -6.33 -4.60 95% mean confidence interval for cycles %-change: 0.07% 0.16% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total loops in shared programs: 854 -> 850 (-0.47%) loops in affected programs: 4 -> 0 helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. GM45 total instructions in shared programs: 4769335 -> 4770019 (0.01%) instructions in affected programs: 90821 -> 91505 (0.75%) helped: 219 HURT: 167 helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.11% max: 4.35% x̄: 1.70% x̃: 1.30% HURT stats (abs) min: 1 max: 19 x̄: 8.02 x̃: 7 HURT stats (rel) min: 0.32% max: 22.86% x̄: 4.95% x̃: 4.57% 95% mean confidence interval for instructions value: 1.12 2.43 95% mean confidence interval for instructions %-change: 0.77% 1.59% Instructions are HURT. total cycles in shared programs: 121980262 -> 121970888 (<.01%) cycles in affected programs: 12861602 -> 12852228 (-0.07%) helped: 1040 HURT: 492 helped stats (abs) min: 2 max: 70 x̄: 17.65 x̃: 16 helped stats (rel) min: <.01% max: 3.36% x̄: 0.32% x̃: 0.21% HURT stats (abs) min: 2 max: 114 x̄: 18.26 x̃: 6 HURT stats (rel) min: <.01% max: 10.50% x̄: 1.00% x̃: 0.35% 95% mean confidence interval for cycles value: -7.34 -4.89 95% mean confidence interval for cycles %-change: 0.05% 0.17% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total loops in shared programs: 631 -> 629 (-0.32%) loops in affected programs: 2 -> 0 helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> Cc: Marek Olšák <marek.ol...@amd.com> Cc: Rob Clark <robdcl...@gmail.com> Cc: Eric Anholt <e...@anholt.net> --- src/compiler/nir/nir_opt_algebraic.py | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index f11a987c462..54f901e6cad 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -120,8 +120,6 @@ optimizations = [ (('flrp@64', 1.0, b, c), ('fadd', ('fsub', 1.0, c), ('fmul', b, c)), 'options->lower_flrp64'), (('flrp@32', a, 1.0, c), ('fadd', a, ('fmul', c, ('fsub', 1.0, a))), 'options->lower_flrp32'), (('flrp@64', a, 1.0, c), ('fadd', a, ('fmul', c, ('fsub', 1.0, a))), 'options->lower_flrp64'), - (('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp32'), - (('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp64'), (('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'), (('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', c)))), ('fmul', b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'), (('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg', c ))), ('fmul', b, c )), ('flrp', a, b, c), '!options->lower_flrp32'), @@ -134,6 +132,30 @@ optimizations = [ (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), + # flrp(a, b, c) * flrp(d, e, c) + # (a(1-c) + bc)) * (d(1-c) + ec) + # + # Since (1-d) is common, it is one operation less than the other + # expansion. + (('fmul', ('flrp@32', a, b, c), ('flrp@32', d, 'e', c)), + ('fmul', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c)), + ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))), + 'options->lower_flrp32'), + (('fmul', ('flrp@64', a, b, c), ('flrp@64', d, 'e', c)), + ('fmul', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c)), + ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))), + 'options->lower_flrp64'), + + # (f * flrp(a, b, c)) * flrp(d, e, c) + (('fmul', ('fmul', 'f', ('flrp@32', a, b, c)), ('flrp@32', d, 'e', c)), + ('fmul', ('fmul', 'f', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c))), + ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))), + 'options->lower_flrp32'), + (('fmul', ('fmul', 'f', ('flrp@64', a, b, c)), ('flrp@64', d, 'e', c)), + ('fmul', ('fmul', 'f', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c))), + ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))), + 'options->lower_flrp64'), + (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d)), (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), (('fdot4', ('vec4', a, b, 0.0, 0.0), c), ('fdot2', ('vec2', a, b), c)), @@ -887,6 +909,10 @@ late_optimizations = [ # Lowered for backends without a dedicated b2f instruction (('b2f@32', a), ('iand', a, 1.0), 'options->lower_b2f'), + + (('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp32'), + (('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp64'), + (('fsub', a, b), ('fadd', a, ('fneg', b)), 'options->lower_sub'), ] print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()) -- 2.14.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev