Kenneth Graunke <kenn...@whitecape.org> writes: > When the vec4 backend encountered an ir_triop_lrp, it always emitted an > actual LRP instruction, which only exists on Gen6+. Gen4-5 used > lower_instructions() to decompose ir_triop_lrp at the IR level. > > Since commit 8d37e9915a3b21 ("glsl: Optimize open-coded lrp into lrp."), > we've had an bug where lower_instructions translates ir_triop_lrp into > arithmetic, but opt_algebraic reassembles it back into a lrp. > > To avoid this ordering concern, just handle ir_triop_lrp in the backend. > The FS backend already does this, so we may as well do likewise. > > Cc: "10.1" <mesa-sta...@lists.freedesktop.org> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253 > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > --- > src/mesa/drivers/dri/i965/brw_vec4.h | 3 +++ > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 36 > +++++++++++++++++++++----- > 2 files changed, 32 insertions(+), 7 deletions(-) > > This patch fixes a regression from 10.0 to 10.1, and really needs to be > cherry-picked before the final 10.1.0 release. > > Technically, it's the only one that needs to be cherry-picked, but I figured > I may as well CC the whole series and leave it up to the stable maintainers. > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h > b/src/mesa/drivers/dri/i965/brw_vec4.h > index 6bd8b80..fb5c0a6 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.h > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h > @@ -506,6 +506,9 @@ public: > > void emit_minmax(uint32_t condmod, dst_reg dst, src_reg src0, src_reg > src1); > > + void emit_lrp(const dst_reg &dst, > + const src_reg &x, const src_reg &y, const src_reg &a); > + > void emit_block_move(dst_reg *dst, src_reg *src, > const struct glsl_type *type, uint32_t predicate); > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > index 95e0064..d4f1899 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > @@ -1132,6 +1132,34 @@ vec4_visitor::emit_minmax(uint32_t conditionalmod, > dst_reg dst, > } > } > > +void > +vec4_visitor::emit_lrp(const dst_reg &dst, > + const src_reg &x, const src_reg &y, const src_reg &a) > +{ > + if (brw->gen >= 6) { > + /* Note that the instruction's argument order is reversed from GLSL > + * and the IR. > + */ > + emit(LRP(dst, > + fix_3src_operand(a), fix_3src_operand(y), > fix_3src_operand(x))); > + } else { > + /* Earlier generations don't support three source operations, so we > + * need to emit x*(1-a) + y*a. > + */ > + dst_reg y_times_a = dst_reg(this, glsl_type::vec4_type); > + dst_reg one_minus_a = dst_reg(this, glsl_type::vec4_type); > + dst_reg x_times_one_minus_a = dst_reg(this, glsl_type::vec4_type); > + y_times_a.writemask = dst.writemask; > + one_minus_a.writemask = dst.writemask; > + x_times_one_minus_a.writemask = dst.writemask; > + > + emit(MUL(y_times_a, y, a)); > + emit(ADD(one_minus_a, negate(a), src_reg(1.0f))); > + emit(MUL(x_times_one_minus_a, x, src_reg(one_minus_a))); > + emit(ADD(dst, src_reg(x_times_one_minus_a), src_reg(y_times_a))); > + } > +}
I think we would do better by emitting ADD(y_minus_x, y, negate(x)) MAC(dst, x, y_minus_x, a) Then gen4/5 get a win from the algebraic pass existing, like gen6+. Other than that, I like the series.
pgpFysDz6zu1H.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev