When both fadd and fmul instructions have at least one operand that is a constant and it is only used once, the total number of instructions can be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because the constants will be progagated as immediate operands of fmul and fadd.
This patch modifies opt_peephole_ffma pass to detect this situation and bails-out fusing fmul+fadd into ffma. As shown in shader-db results below, it seems to help a good bunch. However, there are some caveats: * It seems i965 specific, so I'm not sure if modifying the NIR pass directly is desired, as opposed to moving this to the backend. * There are still a high number of HURTs, but these could be reduced by being more specific in the conditions to bailout. total instructions in shared programs: 1683959 -> 1677447 (-0.39%) instructions in affected programs: 604918 -> 598406 (-1.08%) helped: 4633 HURT: 804 GAINED: 0 LOST: 0 --- src/glsl/nir/nir_opt_peephole_ffma.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c b/src/glsl/nir/nir_opt_peephole_ffma.c index 4f0f0da..da47f8f 100644 --- a/src/glsl/nir/nir_opt_peephole_ffma.c +++ b/src/glsl/nir/nir_opt_peephole_ffma.c @@ -133,6 +133,28 @@ get_mul_for_src(nir_alu_src *src, int num_components, return alu; } +/** + * Given a list of (at least two) nir_alu_src's, tells if any of them is a + * constant value and is used only once. + */ +static bool +any_alu_src_is_a_constant(nir_alu_src srcs[]) +{ + for (unsigned i = 0; i < 2; i++) { + if (srcs[i].src.ssa->parent_instr->type == nir_instr_type_load_const) { + nir_load_const_instr *load_const = + nir_instr_as_load_const (srcs[i].src.ssa->parent_instr); + + if (list_length(&load_const->def.uses) == 1 && + list_length(&load_const->def.if_uses) == 0) { + return true; + } + } + } + + return false; +} + static bool nir_opt_peephole_ffma_block(nir_block *block, void *void_state) { @@ -183,6 +205,15 @@ nir_opt_peephole_ffma_block(nir_block *block, void *void_state) mul_src[0] = mul->src[0].src.ssa; mul_src[1] = mul->src[1].src.ssa; + /* If any of the operands of the fmul and any of the fadd is a constant, + * we bypass because it will be more efficient as the constants will be + * propagated as operands, potentially saving two load_const instructions. + */ + if (any_alu_src_is_a_constant(mul->src) && + any_alu_src_is_a_constant(add->src)) { + continue; + } + if (abs) { for (unsigned i = 0; i < 2; i++) { nir_alu_instr *abs = nir_alu_instr_create(state->mem_ctx, -- 2.4.6 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev