I think this is being both too aggressive and not aggressive enough. First, the too aggressive part. Imagine if we have something like:
a = mul b, c; d = add a, 1.0; e = add a, 2.0; f = add a, 3.0; In this case, we don't want to turn this into a series of 3 mad's since the 3 load immediate instructions are going to hurt us more than the elimination of the mul. When the only thing using the mul is a bunch of add's, it's only profitable when there's at most one distinct immediate that isn't used by other MAD's -- any more and we gain instructions. Also, I think that we might still want to fuse the multiply and the add even if the mul is used somewhere else, since at least it gives the scheduler some extra freedom to hide latency -- as long as we don't have to emit any load immediate instructions. Of course, this is all very backend-specific so at this point it would make sense to move this to an i965 pass and leave in the general ffma peephole for everyone else without these weird restrictions. On Mon, Mar 23, 2015 at 11:13 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > shader-db results for fragment shaders on Haswell: > total instructions in shared programs: 4395688 -> 4389623 (-0.14%) > instructions in affected programs: 355876 -> 349811 (-1.70%) > helped: 1455 > HURT: 14 > GAINED: 5 > LOST: 0 > --- > src/glsl/nir/nir_opt_peephole_ffma.c | 41 > ++++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > > diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c > b/src/glsl/nir/nir_opt_peephole_ffma.c > index 4674505..b875bb6 100644 > --- a/src/glsl/nir/nir_opt_peephole_ffma.c > +++ b/src/glsl/nir/nir_opt_peephole_ffma.c > @@ -38,6 +38,41 @@ struct peephole_ffma_state { > bool progress; > }; > > +static inline bool > +are_all_uses_fadd(nir_ssa_def *def) > +{ > + if (def->if_uses->entries > 0) > + return false; > + > + struct set_entry *use_iter; > + set_foreach(def->uses, use_iter) { > + nir_instr *use_instr = (nir_instr *)use_iter->key; > + > + if (use_instr->type != nir_instr_type_alu) > + return false; > + > + nir_alu_instr *use_alu = nir_instr_as_alu(use_instr); > + switch (use_alu->op) { > + case nir_op_fadd: > + break; /* This one's ok */ > + > + case nir_op_imov: > + case nir_op_fmov: > + case nir_op_fneg: > + case nir_op_fabs: > + assert(use_alu->dest.dest.is_ssa); > + if (!are_all_uses_fadd(&use_alu->dest.dest.ssa)) > + return false; > + break; > + > + default: > + return false; > + } > + } > + > + return true; > +} > + > static inline nir_alu_instr * > get_mul_for_src(nir_alu_src *src, uint8_t swizzle[4], bool *negate, bool > *abs) > { > @@ -66,6 +101,12 @@ get_mul_for_src(nir_alu_src *src, uint8_t swizzle[4], > bool *negate, bool *abs) > break; > > case nir_op_fmul: > + /* Only absorbe a fmul into a ffma if the fmul is is only used in fadd > + * operations. This prevents us from being too agressive with our > + * fusing which can actually lead to more instructions. > + */ > + if (!are_all_uses_fadd(&alu->dest.dest.ssa)) > + return NULL; > break; > > default: > -- > 2.3.3 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev