On Mon, Mar 23, 2015 at 8:40 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Mon, Mar 23, 2015 at 8:34 PM, Matt Turner <matts...@gmail.com> wrote: >> On Mon, Mar 23, 2015 at 8:13 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: >>> The idea here is that fusing multiply-add combinations too early can reduce >>> our ability to perform CSE and value-numbering. Instead, we split ffma >>> opcodes up-front, hope CSE cleans up, and then fuse after-the-fact. >>> Unless an algebraic pass does something silly where it inserts something >>> between the multiply and the add, splitting and re-fusing should never >>> cause a problem. We run the late algebraic optimizations after this so >>> that things like compare-with-zero don't hurt our ability to fuse things. >>> >>> shader-db results for fragment shaders on Haswell: >>> total instructions in shared programs: 4390538 -> 4379236 (-0.26%) >>> instructions in affected programs: 989359 -> 978057 (-1.14%) >>> helped: 5308 >>> HURT: 97 >>> GAINED: 78 >>> LOST: 5 >>> >>> This does, unfortunately, cause some substantial hurt to a shader in Kerbal >>> Space Program. However, the damage is caused by changing a single >>> instruction from a ffma to an add. This, in turn, *decreases* register >>> pressure in one part of the program causing it to fail to register allocate >>> and spill. Given the overwhelmingly positive results in other shaders and >>> the fact that the NIR for the Kerbal shaders is actually better, this >>> should be considered a positive. >>> --- >>> src/mesa/drivers/dri/i965/brw_context.c | 1 + >>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 ++++++++-- >>> 2 files changed, 9 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c >>> b/src/mesa/drivers/dri/i965/brw_context.c >>> index ed6fdff..7cab1c9 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_context.c >>> +++ b/src/mesa/drivers/dri/i965/brw_context.c >>> @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context >>> *brw) >>> >>> static const nir_shader_compiler_options gen6_nir_options = { >>> .native_integers = true, >>> + .lower_ffma = true, >> >> I see what's going on, but this probably deserves a comment. > > Sure. How about: > >> In order to help allow for better CSE at the NIR level we tell NIR to split >> all ffma instructions during opt_algebraic and we then re-combine them as a >> later step.
Sounds good. > >>> }; >>> >>> /* We want the GLSL compiler to emit code that uses condition codes */ >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> index cdf9b01..b0b86e3 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> @@ -51,8 +51,6 @@ nir_optimize(nir_shader *nir) >>> nir_validate_shader(nir); >>> progress |= nir_opt_algebraic(nir); >>> nir_validate_shader(nir); >>> - progress |= nir_opt_peephole_ffma(nir); >>> - nir_validate_shader(nir); >>> progress |= nir_opt_constant_folding(nir); >>> nir_validate_shader(nir); >>> progress |= nir_opt_remove_phis(nir); >>> @@ -131,6 +129,12 @@ fs_visitor::emit_nir_code() >>> >>> nir_optimize(nir); >>> >>> + if (brw->gen >= 6) { >>> + /* Try and fuse multiply-adds */ >>> + nir_opt_peephole_ffma(nir); >>> + nir_validate_shader(nir); >>> + } >>> + >>> nir_opt_algebraic_late(nir); >>> nir_validate_shader(nir); >>> >>> @@ -141,6 +145,8 @@ fs_visitor::emit_nir_code() >>> nir_validate_shader(nir); >>> nir_copy_prop(nir); >>> nir_validate_shader(nir); >>> + nir_opt_dce(nir); >>> + nir_validate_shader(nir); >> >> Just to confirm: this did help something? > > We should always run dead code after copy prop and we weren't. > However, that should probably be its own patch. I'll split it out if > you'd like. Seems like the right thing to do. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev