Am 11.03.2014 01:23, schrieb Ian Romanick: > I had a pretty similar patch on the top of my pow-optimization branch. > I also expand x**3 and x**4. I had hoped that would enable some cases > to expand then merge to MADs. It should also be faster on older GENs > where POW perf sucks. I didn't send it out because I wanted to add a > similar optimization in the back end that would turn x*x*x*x back into > x**4 on GPUs where the POW would be faster. I have no idea what performance POW has on newer intel gpu hw (since in contrast to older pre-snb hw with separate mathbox the manual doesn't list throughput for extended math functions, at least I never found it), but I find it highly unlikely that a POW has a cost lower than 2 muls anywhere.
Roland > I also didn't have anything in shader-db that benefitted from x**2 or > x**3. It seems like there were a couple that would be modified by a > x**5 flattening, but I think that would universally be slower.... > > On 03/10/2014 03:54 PM, Matt Turner wrote: >> Cuts two instructions out of SynMark's Gl32VSInstancing benchmark. >> --- >> src/glsl/opt_algebraic.cpp | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp >> index 5c49a78..8494bd9 100644 >> --- a/src/glsl/opt_algebraic.cpp >> +++ b/src/glsl/opt_algebraic.cpp >> @@ -528,6 +528,14 @@ ir_algebraic_visitor::handle_expression(ir_expression >> *ir) >> if (is_vec_two(op_const[0])) >> return expr(ir_unop_exp2, ir->operands[1]); >> >> + if (is_vec_two(op_const[1])) { >> + ir_variable *x = new(ir) ir_variable(ir->operands[1]->type, "x", >> + ir_var_temporary); >> + base_ir->insert_before(x); >> + base_ir->insert_before(assign(x, ir->operands[0])); >> + return mul(x, x); >> + } >> + >> break; >> >> case ir_unop_rcp: >> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev