Kenneth Graunke <kenn...@whitecape.org> writes: > On 09/22/2012 02:04 PM, Eric Anholt wrote: >> This makes a giant pile of code newly dead. It also fixes TXB on newer >> chipsets, which has been totally broken (I now have a piglit test for that). >> It passes the same set of Ian's ARB_fragment_program tests. It also improves >> high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better >> optimization and having 8-wide along with 16-wide shaders. >> --- >> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >> src/mesa/drivers/dri/i965/brw_fs.cpp | 36 +- >> src/mesa/drivers/dri/i965/brw_fs.h | 30 +- >> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 22 +- >> src/mesa/drivers/dri/i965/brw_fs_fp.cpp | 781 >> ++++++++++++++++++++++++++ >> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 +- >> src/mesa/drivers/dri/i965/brw_wm.c | 58 +- >> src/mesa/drivers/dri/i965/brw_wm_state.c | 19 +- >> src/mesa/drivers/dri/i965/gen6_wm_state.c | 8 +- >> src/mesa/drivers/dri/i965/gen7_wm_state.c | 8 +- >> 10 files changed, 857 insertions(+), 109 deletions(-) >> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_fp.cpp > > I think the LIT code may be broken (comments inline), and one comment is > wrong. Assuming you fix (or refute) those, then patches 1-8 are: > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > > I haven't read through 9 and 10 yet, but I plan to soon.
>> +void >> +fs_visitor::emit_fragment_program_code() >> +{ >> + setup_fp_regs(); >> + >> + fs_reg null = fs_reg(brw_null_reg()); >> + >> + /* Keep a reg with 0.0 around, for reuse use by emit_sop so that it can > > "Keep a reg with 1.0 around, for reuse by emit_fp_sop" > ^^^ (not 0.0) ^^ (function name) fixed >> + case OPCODE_DP2: >> + case OPCODE_DP3: >> + case OPCODE_DP4: >> + case OPCODE_DPH: { >> + fs_reg mul = fs_reg(this, glsl_type::float_type); >> + fs_reg acc = fs_reg(this, glsl_type::float_type); >> + int count; >> + >> + switch (fpi->Opcode) { >> + case OPCODE_DP2: count = 2; break; >> + case OPCODE_DP3: count = 3; break; >> + case OPCODE_DP4: count = 4; break; >> + case OPCODE_DPH: count = 3; break; >> + default: assert(!"not reached"); count = 0; break; >> + } >> + >> + emit(BRW_OPCODE_MUL, acc, >> + regoffset(src[0], 0), regoffset(src[1], 0)); >> + for (int i = 1; i < count; i++) { >> + emit(BRW_OPCODE_MUL, mul, >> + regoffset(src[0], i), regoffset(src[1], i)); >> + emit(BRW_OPCODE_ADD, acc, acc, mul); >> + } > > Future optimization: MAD would be nice here, but that can be done later. Yeah, or even MACs. This is a codegen quality regression from brw_wm_*.c. >> + case OPCODE_LIT: >> + /* From the ARB_fragment_program spec: >> + * >> + * tmp = VectorLoad(op0); >> + * if (tmp.x < 0) tmp.x = 0; >> + * if (tmp.y < 0) tmp.y = 0; >> + * if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon); >> + * else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon; >> + * result.x = 1.0; >> + * result.y = tmp.x; >> + * result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : >> 0.0; >> + * result.w = 1.0; >> + */ >> + if (fpi->DstReg.WriteMask & WRITEMASK_X) >> + emit(BRW_OPCODE_MOV, regoffset(dst, 0), fs_reg(1.0f)); >> + >> + if (fpi->DstReg.WriteMask & WRITEMASK_YZ) { >> + fs_inst *inst; >> + inst = emit(BRW_OPCODE_CMP, null, >> + regoffset(src[0], 0), fs_reg(0.0f)); >> + inst->conditional_mod = BRW_CONDITIONAL_LE; >> + >> + if (fpi->DstReg.WriteMask & WRITEMASK_Y) { >> + emit(BRW_OPCODE_MOV, regoffset(dst, 1), regoffset(src[0], >> 0)); >> + inst = emit(BRW_OPCODE_MOV, regoffset(dst, 1), fs_reg(0.0f)); >> + inst->predicated = true; >> + } >> + >> + if (fpi->DstReg.WriteMask & WRITEMASK_Z) { >> + emit_math(SHADER_OPCODE_POW, regoffset(dst, 2), >> + regoffset(src[0], 1), regoffset(src[0], 3)); >> + >> + inst = emit(BRW_OPCODE_MOV, regoffset(dst, 2), fs_reg(0.0f)); >> + inst->predicated = true; > > This looks broken...don't you need to handle clamping to (-128, 128)? Ah, I lifted this code directly from the former backend. I'll put in a note that I didn't change behavior. I don't know of any use for the -128,128 clamping. I think it's just a spec artifact from this extension being written for a particular hardware instruction set. We should probably fix it for thoroughness, but I don't expect app behavior to change. Do you want to see that done in this series? >> + case OPCODE_SCS: >> + if (fpi->DstReg.WriteMask & WRITEMASK_X) { >> + emit_math(SHADER_OPCODE_COS, regoffset(dst, 0), >> + regoffset(src[0], 0)); >> + } >> + >> + if (fpi->DstReg.WriteMask & WRITEMASK_Y) { >> + emit_math(SHADER_OPCODE_SIN, regoffset(dst, 1), >> + regoffset(src[0], 1)); >> + } >> + break; > > Future optimization: we could use the actual SINCOS math instruction > when asking for WRITEMASK_XY. But I don't know how common that is. I haven't seen one yet. If we did, given that GLSL doesn't have a sincos, we probably would want to do it as a peephole optimization. (I have seen sincos-applicable shaders on the vertex side, though. Not well-written ones, just examples)
pgpciDfRlTI50.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev