On Fri, Aug 23, 2013 at 9:24 AM, Paul Berry <stereotype...@gmail.com> wrote: > On 22 August 2013 16:08, Matt Turner <matts...@gmail.com> wrote: >> >> --- >> src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 1 + >> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 6 ++++++ >> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 6 ++++++ >> 3 files changed, 13 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp >> index 6ee6d01..34dbc90 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp >> @@ -362,6 +362,7 @@ >> ir_channel_expressions_visitor::visit_leave(ir_assignment *ir) >> >> case ir_triop_fma: >> case ir_triop_lrp: >> + case ir_triop_cond_sel: >> case ir_triop_bitfield_extract: >> for (i = 0; i < vector_elements; i++) { >> ir_rvalue *op0 = get_element(op_var[0], i); >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> index 4b54bee..27887d6 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >> @@ -735,6 +735,12 @@ fs_visitor::visit(ir_expression *ir) >> case ir_triop_lrp: >> emit_lrp(this->result, op[0], op[1], op[2]); >> break; >> + >> + case ir_triop_cond_sel: >> + emit(CMP(reg_null_d, op[0], fs_reg(0), BRW_CONDITIONAL_NZ)); >> + inst = emit(BRW_OPCODE_SEL, this->result, op[1], op[2]); >> + inst->predicate = BRW_PREDICATE_NORMAL; >> + break; >> } >> } > > > For the uses of ir_triop_cond_sel we have currently (the lowering passes in > patch 14), I believe this will generate efficient code. But if we adapt the > mix() functions to use it, then there are probably going to be a lot of uses > like this: > > x = mix(y, z, a < b); > > Which will compile down to this silly assembly (please excuse the > pseudocode--I can't remember the assembly syntax exactly): > > CMP.lt tmp a b > CMP.nz null tmp 0 > SEL.f0 x y z > > What if we modify the loop that calls ir->operands[operand]->accept(this) on > each operand (near the top of the visitor) so that it skips operand 0 when > the expression is ir_triop_cond_sel. Then, in the switch statement, we can > do something like this: > > emit_bool_to_cond_code(ir->operands[0]); > > inst = emit(BRW_OPCODE_SEL, this->result, op[1], op[2]); > inst->predicate = BRW_PREDICATE_NORMAL; > > That should produce the assembly we want, which is: > > CMP.lt null a b > SEL.fo x y z
I agree. I tried to do this and kept running into problems that would be solved by considering the flags registers better in various optimization passes. Using the code from fs-mix-bvec4-infnan.shader_test for example (gl_FragColor = mix(arg0, arg1, selector);). We generate something like the following with some local changes: cmp.e.f0(8) null g2.4<0,1,0>D 0D (+f0) sel(8) m1<1>F g2<0,1,0>F g4<8,8,1>F cmp.e.f0(8) null g2.5<0,1,0>D 0D (+f0) sel(8) m2<1>F g2.1<0,1,0>F g7<8,8,1>F cmp.e.f0(8) null g2.6<0,1,0>D 0D (+f0) sel(8) m3<1>F g4<8,8,1>F g2<0,1,0>F cmp.e.f0(8) null g2.7<0,1,0>D 0D (+f0) sel(8) m4<1>F g7<8,8,1>F g2.1<0,1,0>F which would benefit from using the multiple flag registers available on IVB and newer (or even the subregisters, not sure how hardware dependency tracking works for them). And from the code generated for frexp is stupid: cmp.ne.f0(8) g5<1>D (abs)g2<0,1,0>F 0F ... cmp.e.f0(8) null g5<8,8,1>D 0D (-f0) sel(8) g9<1>D g8<8,8,1>D -126D ... cmp.e.f0(8) null g5<8,8,1>D 0D (-f0) sel(8) g12<1>UD g11<8,8,1>UD 0x3f000000UD where we could just evaluate the abs(x) != 0.0f and use that flag register for the two sels. Basically, I think what I'm saying is yes, the generated code is silly, but that I think our backend really needs to understand about flags register to make it significantly better. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev